Lakeside Software incident
Sensor Data Processing Delays - Americas
Lakeside Software experienced a minor incident on January 31, 2024 affecting SysTrack API/UI, lasting 7h 47m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 31, 2024, 10:16 AM UTC
We are experiencing Sensor data processing delay in the Americas region.
- investigating Jan 31, 2024, 11:40 AM UTC
We are continuing to investigate this issue.
- monitoring Jan 31, 2024, 12:38 PM UTC
Backlogs are processing and we are monitoring the results.
- investigating Jan 31, 2024, 02:21 PM UTC
We are currently investigating this issue.
- monitoring Jan 31, 2024, 03:19 PM UTC
Backlogs are processing and we are monitoring the results.
- resolved Jan 31, 2024, 06:04 PM UTC
This incident has been resolved.
- postmortem Feb 06, 2024, 07:31 PM UTC
# What was the issue? Some clients experienced slow processing of sensor data. Some sensor data at the master was lost between 1/30/2024 11:45PM ET to 1/31/2024 5:08AM ET. # What was the root cause? A logic error within an infrastructure as code script that supports the queue was uncovered. This error caused some data to requeue incorrectly and consume unnecessary resources. This caused some sensor data to not queue correctly. # What was the short term resolution? 1. Scale up underlaying systems to handle the extra load until the processing finished 2. Disable the automation so it stopped causing jobs to re-queue. 3. Scale the queue to a larger size to support more caching of data # What is the Prevention Strategy? 1. Perform additional validation of IaC scripts 2. Update deployment strategy for IaC scripts to support slower rollouts. 3. Add automation to scale queuing system as needed 4. Add an additional logic feature in the product to resend sensor data when it does not arrive in the centralized database. This is the only data in the product that does not have an easy ability to be resent and processed; therefore any infrastructure problem due to the fault of Lakeside or Microsoft could potentially cause a gap of sensor data.