Esper.io incident

Delayed processing of messages from devices

Minor Resolved View vendor source →

Esper.io experienced a minor incident on March 2, 2024 affecting Esper Systems, lasting 18h 50m. The incident has been resolved; the full update timeline is below.

Started
Mar 02, 2024, 04:49 PM UTC
Resolved
Mar 03, 2024, 11:39 AM UTC
Duration
18h 50m
Detected by Pingoru
Mar 02, 2024, 04:49 PM UTC

Affected components

Esper Systems

Update timeline

  1. identified Mar 02, 2024, 04:49 PM UTC

    Our team has identified an issue causing degraded performance in message processing from devices. These messages include device status update, telemetry and command updates. We’re working to resolve it. Which services are affected? API: No Console: Yes (delayed last seen, incorrect online/offline status, delayed graphs and alerts). All other console operations are working as expected. Devices: No (device operations continue to work as expected)

  2. identified Mar 02, 2024, 05:39 PM UTC

    We've identified the fix and working on deploying this for all customers. The problems was with one of our message processing systems which had slowed down and caused the lag.

  3. identified Mar 02, 2024, 10:26 PM UTC

    We have rolled out the fix for most of the customers and the message processing lag is steadily going down. Some customers will continue to see delays in Last Seen, Online/Offline Status, and Command processing till the lag comes to 0. We've also scaled up the infrastructure to speed up the process.

  4. monitoring Mar 03, 2024, 09:54 AM UTC

    We no longer see delayed processing of messages. We're continuing to monitor.

  5. resolved Mar 03, 2024, 11:39 AM UTC

    This incident has been resolved.