Technolutions incident

Overnight background services delayed

Notice Resolved View vendor source →

Technolutions experienced a notice incident on February 6, 2024 affecting Slate, lasting 1d 11h. The incident has been resolved; the full update timeline is below.

Started
Feb 06, 2024, 01:25 PM UTC
Resolved
Feb 08, 2024, 12:57 AM UTC
Duration
1d 11h
Detected by Pingoru
Feb 06, 2024, 01:25 PM UTC

Affected components

Slate

Update timeline

  1. monitoring Feb 06, 2024, 01:25 PM UTC

    An AWS-initiated overnight update to a series of Redis servers used for background service queue management has delayed background queue processing as the queues entered a stalled state. Delayed queues are processing now and will complete shortly.

  2. monitoring Feb 06, 2024, 10:14 PM UTC

    As of earlier this afternoon, fewer than 1% of databases had ongoing overnight jobs still executing. The remaining databases have jobs that typically take hours overnight to complete, and with the accumulation of daytime jobs and blocking processes on their databases, are continuing to run. Separately, we have adjusted the background process monitoring to better recover from a scenario in which the Redis servers become temporarily unavailable due to hardware or software updates to those nodes.

  3. resolved Feb 08, 2024, 12:57 AM UTC

    No issues have been observed since the overnight background service queue delays on 2/6, and all overnight queues on 2/7 performed as expected. We will continue to monitor background services to ensure ongoing operation within desired ranges.