Thought Industries experienced a major incident on October 24, 2024 affecting US - Platform and EU - Platform, lasting 1h 40m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 24, 2024, 01:16 PM UTC
We are investigating elevated load times in the US.
- identified Oct 24, 2024, 01:27 PM UTC
The issue has been identified and a fix is being implemented.
- identified Oct 24, 2024, 01:50 PM UTC
We are continuing to work on a fix for this issue.
- identified Oct 24, 2024, 02:29 PM UTC
We are continuing to work on a fix for this issue.
- identified Oct 24, 2024, 02:44 PM UTC
We are continuing to work on a fix for this issue.
- monitoring Oct 24, 2024, 02:54 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Oct 24, 2024, 02:56 PM UTC
This incident has been resolved.
- postmortem Oct 24, 2024, 03:56 PM UTC
Between 8:15 AM EDT and 11:00 AM EDT the platform experienced significantly elevated response time in both the EU and US. The root cause of this outage was determined to be a routine security upgrade of an external dependency, leading to high CPU on our application servers. Despite auto-scaling due to increased load, we did not see a satisfactory reduction in response time as expected. Reverting the dependency upgrade led to an immediate return to expected response times.