Lakeside Software incident

SysTrack Cloud - Americas – Availability Issues

Major Resolved View vendor source →

Lakeside Software experienced a major incident on March 19, 2024 affecting SysTrack API/UI and SysTrack Endpoint Connections, lasting 1d 21h. The incident has been resolved; the full update timeline is below.

Started
Mar 19, 2024, 04:32 AM UTC
Resolved
Mar 21, 2024, 01:42 AM UTC
Duration
1d 21h
Detected by Pingoru
Mar 19, 2024, 04:32 AM UTC

Affected components

SysTrack API/UISysTrack Endpoint Connections

Update timeline

  1. investigating Mar 19, 2024, 04:32 AM UTC

    We are currently investigating reported issues with SysTrack Cloud availability in the below regions. Some users may not be able to use the SysTrack cloud UI, make API calls, or use third-party integrations that utilize the API. We're actively working on identifying the root cause. We apologize for any inconvenience and will provide an update once more details become available.

  2. investigating Mar 19, 2024, 05:29 AM UTC

    We are continuing to investigate this issue.

  3. investigating Mar 19, 2024, 06:26 AM UTC

    We are continuing to investigate this issue.

  4. monitoring Mar 19, 2024, 06:34 AM UTC

    A fix has been implemented on Microsoft component and we are monitoring the results for next several hours.

  5. investigating Mar 19, 2024, 08:05 AM UTC

    We are currently investigating this issue.

  6. investigating Mar 19, 2024, 09:03 AM UTC

    We are continuing to investigate this issue.

  7. investigating Mar 19, 2024, 10:17 AM UTC

    We are continuing to investigate this issue.

  8. investigating Mar 19, 2024, 11:23 AM UTC

    We are continuing to investigate this issue.

  9. identified Mar 19, 2024, 11:36 AM UTC

    The issue has been identified on our Cloud Provider end and we are working with them to fix the issue as soon as possible.

  10. identified Mar 19, 2024, 12:54 PM UTC

    We have made some changes and are seeing some improvement, but we are continuing to working with our cloud provider as the problem is not fully resolved. We will continue posting updates every 60 minutes.

  11. identified Mar 19, 2024, 01:53 PM UTC

    We have been experiencing a wide spread issue with a Microsoft managed service (application gateway) which is the load balancer used for the SysTrack Cloud product. The issues started yesterday around 7AM EST in one region, and was resolved temporarily, but continued to be a problem as of today around 12AM EST. The issue is affecting multiple Azure regions. We are working closely with Microsoft to resolve the issue and will provide updates hourly to the cloud status page as usual.

  12. identified Mar 19, 2024, 02:55 PM UTC

    We are continuing to work closely with Microsoft to resolve the issue and will provide updates hourly to the cloud status page as usual.

  13. identified Mar 19, 2024, 04:02 PM UTC

    We are continuing to work closely with Microsoft to resolve the issue and will provide updates hourly to the cloud status page as usual.

  14. identified Mar 19, 2024, 05:23 PM UTC

    We are continuing to work closely with Microsoft to resolve the issue and will provide updates hourly to the cloud status page as usual.

  15. identified Mar 19, 2024, 06:24 PM UTC

    We are continuing to work closely with Microsoft to resolve the issue and will provide updates hourly to the cloud status page as usual.

  16. identified Mar 19, 2024, 08:01 PM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  17. identified Mar 19, 2024, 08:58 PM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  18. identified Mar 19, 2024, 09:58 PM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  19. identified Mar 20, 2024, 12:23 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  20. identified Mar 20, 2024, 02:26 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  21. identified Mar 20, 2024, 03:22 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  22. identified Mar 20, 2024, 04:49 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  23. identified Mar 20, 2024, 06:01 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  24. identified Mar 20, 2024, 07:12 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  25. identified Mar 20, 2024, 08:18 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  26. identified Mar 20, 2024, 09:05 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence

  27. identified Mar 20, 2024, 10:05 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence.

  28. identified Mar 20, 2024, 11:24 AM UTC

    In cooperation with Microsoft, we have identified the root cause of the problem. The issue resulted from a recent change to Microsoft's environment. We are working closely with Microsoft to restore all services and mitigate the risk of any reoccurrence

  29. monitoring Mar 20, 2024, 01:30 PM UTC

    Microsoft has remediated the problem. All services are now operational. We will continue to monitor the situation and provide updates approximately every 6 hours.

  30. resolved Mar 21, 2024, 01:42 AM UTC

    This incident has been resolved.

  31. postmortem Apr 04, 2024, 01:58 AM UTC

    # What was the issue? All customers experienced intermittent problems loading the SysTrack CE for a period of time. Agents as well as users of the SysTrack web product were unable to connect for periods of time: # What was the root cause? This is a preliminary RCA that is subject to change once we get the final RCA from Microsoft. Thus far, we have determined the root cause to be with the Microsoft Azure’s Application Gateway. This managed service, fully supported by Microsoft, appears to have had a backend update which was deployed by Microsoft to the different SysTrack regions. This updated version does not appear to handle our unique workload on the Application Gateway. The managed service got overloaded and caused it to go into an _unavailable_ state thus not accepting any inbound traffic. # What is the Prevention Strategy? We are still working with our cloud provider \(Azure\) to get a full RCA before we prepare an identification and prevention strategy.