Datacake incident

Data processing delays

Minor Resolved View vendor source →

Datacake experienced a minor incident on March 14, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Mar 14, 2024, 11:38 AM UTC
Resolved
Mar 14, 2024, 11:38 AM UTC
Duration
Detected by Pingoru
Mar 14, 2024, 11:38 AM UTC

Update timeline

  1. resolved Mar 14, 2024, 11:38 AM UTC

    Type: Incident Duration: 10 hours and 24 minutes Affected Components: Web Application Mar 14, 22:02:43 GMT+0 - Resolved - All queues have now been cleared with no data loss. We apologize for these delays and appreciate your understanding. In dedication to clarity and process improvement, a comprehensive follow-up about the incident, explaining its causes and the measures taken to prevent recurrence, will be provided shortly. Mar 14, 11:44:27 GMT+0 - Investigating - We are currently investigating this incident. Mar 14, 15:16:30 GMT+0 - Monitoring - The queries in question have now stabilized, resulting in a consistent reduction of the queue. We'll continue to provide updates on the situation in this space. Mar 14, 11:38:58 GMT+0 - Investigating - Our system is experiencing processing delays with incoming data. We are currently investigating the cause of this delay. Mar 14, 12:03:28 GMT+0 - Identified - We've identified the root cause of the recent issue as an unexpected high load on one of our databases. We've enhanced the resources allocated to this database and are currently monitoring its performance closely. Please note that due to a pending backlog, temporary data gaps may be evident in the chart visualizations. However, we're working to ensure data integrity as soon as possible. Mar 14, 12:48:29 GMT+0 - Identified - We are still in the process of examining an extended measurement queue. Please note that data transmission still encounters delays. Mar 15, 14:24:15 GMT+0 - Resolved - The full post mortem can be found on our Engineering Blog: Mar 14, 13:24:54 GMT+0 - Identified - We've identified the initial root cause of the delays as a few long-running migration queries. We are actively monitoring the situation and adjusting resources as necessary to expedite the process.