Limble CMMS incident

Intermittent issues with new PMs and Cycle Counts launching at scheduled times

Limble CMMS experienced a minor incident on March 12, 2025 affecting Limble CMMS Web Application and Limble CMMS API, lasting 2h 29m. The incident has been resolved; the full update timeline is below.

Started: Mar 12, 2025, 01:30 PM UTC
Resolved: Mar 12, 2025, 04:00 PM UTC
Duration: 2h 29m
Detected by Pingoru: Mar 12, 2025, 01:30 PM UTC

Affected components

Limble CMMS Web ApplicationLimble CMMS API

Update timeline

investigating Mar 12, 2025, 11:31 PM UTC

Some customers are experiencing issues with new PMs and Cycle Counts launching at their scheduled times.
identified Mar 12, 2025, 11:33 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Mar 12, 2025, 11:34 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 12, 2025, 11:35 PM UTC

The incident has been resolved. A post-mortem will follow.
postmortem Mar 21, 2025, 04:46 PM UTC

**Date:** March 12, 2025 **Status:** Resolved **Impacted Region\(s\) or Services:** Limble CMMS Web Application **Note:** All times are in Mountain Daylight Time \(MDT\) ## Summary On March 12 at approximately 5 AM we identified that our backend service responsible for generating scheduled tasks had failed to initialize. Further investigation found that the scheduler had been offline since the previous evening at 5:30 PM due to a container image unexpectedly expiring. After initiating our incident response plan, engineers discovered the cause to be an expired container image. Manual steps were taken to update the container image and restart the scheduled task. Further steps were taken to implement detailed monitoring to alert if another imagine expires in the future. ## Impact Scheduled PMs and Cycle Counts experienced a temporary delay of approximately 14 hours, from 5:30 PM on March 11th to 9:00 AM on March 12th, the service was successfully restored, and all scheduled tasks resumed normal operation. ## Root Cause The scheduler container image failed to propagate during a deployment. Consequently, the scheduler attempted to run with an expired image, which resulted in a service startup failure. ## Resolution and Improvements Engineers were quickly able to identify the root cause and its relation to the missing container image. A rollback of the deployment was initiated, which resolved the issue. To prevent future occurrences, the following improvements have been implemented: * **Enhanced Monitoring:** We have deployed detailed monitoring systems specifically designed to detect and alert us to any future container image expirations, ensuring proactive intervention. ## Timeline of Events * 3/12/2025 at 4:59 AM: Customer reports that scheduled items did not send as expected * 7:30 AM: Incident is declared and engineering team is alerted * 8:15 AM: Container image was updated * 8:45 AM: Process was manually re-run * 9:15 AM: All processes re-ran successfully ## **Key Points** * No loss of customer data