Lakeside Software incident

America's Degraded Performance

Minor Resolved View vendor source →

Lakeside Software experienced a minor incident on October 25, 2023 affecting SysTrack API/UI and SysTrack Endpoint Connections, lasting 6h 53m. The incident has been resolved; the full update timeline is below.

Started
Oct 25, 2023, 04:23 PM UTC
Resolved
Oct 25, 2023, 11:17 PM UTC
Duration
6h 53m
Detected by Pingoru
Oct 25, 2023, 04:23 PM UTC

Affected components

SysTrack API/UISysTrack Endpoint Connections

Update timeline

  1. investigating Oct 25, 2023, 04:49 PM UTC

    We are currently investigating this issue.

  2. identified Oct 25, 2023, 05:28 PM UTC

    We have identified the problem and are working on remediating the issue.

  3. identified Oct 25, 2023, 07:03 PM UTC

    We have identified the problem and are working on remediating the issue.

  4. monitoring Oct 25, 2023, 08:33 PM UTC

    We have implemented a fix and are actively monitoring the service.

  5. resolved Oct 25, 2023, 11:17 PM UTC

    We have identified the root cause, implemented a fix, and all systems have been fully restored. We will continue to closely monitor all services, but if you have any issues, please contact Lakeside Support at [email protected].

  6. postmortem Nov 08, 2023, 08:12 PM UTC

    # What was the issue? Some clients experienced slowness or 500 errors when accessing the SysTrack Website or APIs. # What was the root cause? We received an error from the Azure Application Gateway that said “out of resources” but the Azure Console, and APIs to access these metrics, showed that resources were still available. Given incorrect metrics provided by Azure, we were unable to assign the appropriate scale to our application gateway. # What is the Prevention Strategy? 1. **Short Term \(Completed already\):** 1. Scale up the Azure Application Gateway to more than we will need to handle the inconsistency in the reporting metrics. 2. Add Alerting for the Azure Resource Health 3. Lower Alerting thresholds 2. **Short Term and In Progress:** 1. Review external external monitoring solutions thresholds and adjust to be more sensitive. 3. **Medium Term and In Progress:** 1. Implement additional methods to split the traffic between different Azure Application Gateways