INKY incident

Inky dashboard connection problem

Major Resolved View vendor source →

INKY experienced a major incident on February 28, 2022 affecting Dashboard Services US, lasting 8h 4m. The incident has been resolved; the full update timeline is below.

Started
Feb 28, 2022, 06:30 PM UTC
Resolved
Mar 01, 2022, 02:34 AM UTC
Duration
8h 4m
Detected by Pingoru
Feb 28, 2022, 06:30 PM UTC

Affected components

Dashboard Services US

Update timeline

  1. investigating Feb 28, 2022, 06:30 PM UTC

    Inky engineers are investigating an issue affecting access to the the Inky admin dashboards. We will update as soon as more information is available

  2. identified Feb 28, 2022, 06:38 PM UTC

    Inky engineers are seeing the connections to back end systems returning to normal. We are monitoring and ensuring that connections return to normal.

  3. monitoring Feb 28, 2022, 07:10 PM UTC

    Inky Engineers are monitoring and error rates have returned to normal.

  4. resolved Mar 01, 2022, 02:34 AM UTC

    Dashboard issue resolved after configuration change and cycling of servers. Dashboard have been running without connection issues for more than 6 hours.

  5. postmortem Mar 04, 2022, 02:49 PM UTC

    # Post incident report: Start: 28-February-2022 1830 UTC End: 28-February-2022 1910 UTC Duration: 40 min ## Summary: During a configuration change an error was introduced that briefly impacted the ability of some admins to access the Inky Dashboards. ## Root Cause: Made some changes to configurations to accommodate limitations in url length. After the change was made, one of the servers was unable to connect due to a missed configuration. Once the error was corrected, the server connected as expected. ## Customer Impact: Some Inky Admin dashboards were inaccessible for a small period of time. ## Mitigation Action: Corrected misconfiguration and restarted server. ## Follow-up Items and Preventative Measures: 1. Double check configurations and ensure all servers are accounted for during rollouts 1. Ensure servers with issues are quickly taken out of the loop of available servers by temporarily removing it from the pool available to the load balancer