Lakeside Software incident

Elevated Error rate in SysTrack Cloud

Major Resolved View vendor source →

Lakeside Software experienced a major incident on March 27, 2023 affecting SysTrack API/UI and SysTrack Endpoint Connections and 1 more component, lasting 4h 49m. The incident has been resolved; the full update timeline is below.

Started
Mar 27, 2023, 02:51 PM UTC
Resolved
Mar 27, 2023, 07:41 PM UTC
Duration
4h 49m
Detected by Pingoru
Mar 27, 2023, 02:51 PM UTC

Affected components

SysTrack API/UISysTrack Endpoint ConnectionsSysTrack Endpoint ConnectionsSysTrack Endpoint Connections

Update timeline

  1. investigating Mar 27, 2023, 02:23 PM UTC

    Some slowness when using the SysTrack Cloud or issues with agents connecting. We're actively working on identifying the root cause. We apologize for any inconvenience and will provide an update once more details become available

  2. identified Mar 27, 2023, 03:51 PM UTC

    We have identified the issue and are working on fully remediating the root cause.

  3. identified Mar 27, 2023, 04:49 PM UTC

    We are continuing to work on a fix for this issue.

  4. identified Mar 27, 2023, 05:55 PM UTC

    We are the process of deploying a fix and will provide an update as soon as we can.

  5. monitoring Mar 27, 2023, 06:54 PM UTC

    We believe we have remediated the root cause and actively monitoring the situation.

  6. resolved Mar 27, 2023, 07:41 PM UTC

    We have identified the root cause, implemented a fix, and all systems have been fully restored. We will continue to closely monitor all services, but if you have any issues, please contact Lakeside Support at [email protected].

  7. postmortem Mar 31, 2023, 04:03 PM UTC

    # What was the issue? Following the 10.6 release, some customers were affected by the following items: * Some clients experienced slowness when accessing the SysTrack Website. * Some clients were unable to connect to agents using Resolve or Assist # What was the root cause? The root cause for this situation was different for each issue: * Slow Website: A thread locking issue caused database processing to spike. * To resolve, additional indexes were added into the databases. * Agents not connected: A bug was introduced that sometimes caused the service that connects the agent to crash in such a way it didn’t always self heal as intended. * To resolve, we applied a hotfix to the codebase. # What is the Prevention Strategy? * Additional large scale QA testing * Additional monitoring post release * Review future release schedules for potential deployment earlier in the weekend