Scout APM experienced a critical incident on November 20, 2019 affecting Application Monitoring, lasting 1d 2h. The incident has been resolved; the full update timeline is below.
Affected components
Application Monitoring
Update timeline
- investigating Nov 20, 2019, 01:42 PM UTC
One of our ingestion servers is falling a little behind, so ingestion for any customers on that server will be delayed. All data is safe and is being processed.
- investigating Nov 20, 2019, 02:23 PM UTC
We've identified a handful of incoming messages that have slowed our ingestion processing, causing it to fall behind. This has caused tripped circuit breakers in other parts of our app. All data is stored, but ingestion as a whole is paused
- monitoring Nov 20, 2019, 03:45 PM UTC
We've isolated the issue to a single account. We're in contact with that customer and have restarted ingestion for all other accounts.
- resolved Nov 21, 2019, 04:04 PM UTC
This incident has been resolved.