Aviator experienced a major incident on May 10, 2024 affecting Background queues, lasting 39m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 10, 2024, 04:54 PM UTC
We are investigating issue where the background queues may have degraded performance for queueing a PR
- monitoring May 10, 2024, 05:19 PM UTC
The background workers should be alive now. It should clear out the backlog in the next 20-30 mins.
- resolved May 10, 2024, 05:33 PM UTC
This incident has resolved. We will continue monitoring the queues for rest of the day, and will share some postmortem notes.
- postmortem May 13, 2024, 05:58 PM UTC
# Symptom and cause From 2024-05-10 9:40 PT to 10:15 PT, the background queue service was degraded. This was caused by the database used for the task queue had exceeded the capacity. # Mitigation and fix This issue was mitigated by scaling up the database. We have reviewed the substantial fix options, and now the database usage is significantly below the maximum capacity. We have also reviewed and updated the monitoring for the same situation, and alerting rules are also configured now. We do not anticipate the same issue will happen in the near future.