Scout APM experienced a major incident on May 21, 2025 affecting Application Monitoring, lasting 1d 12h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 21, 2025, 01:55 PM UTC
We are currently investigating this issue.
- monitoring May 21, 2025, 02:33 PM UTC
A fix has been implemented and we are monitoring the results.
- identified May 21, 2025, 03:08 PM UTC
The initial fix was unsuccessful, certain accounts are now substantially delayed.
- monitoring May 21, 2025, 04:13 PM UTC
An alternate approach has been applied, we are watching.
- identified May 22, 2025, 02:13 AM UTC
It has been a long day with kafka. We continue to experience instability, causing lag and dropped payloads.
- identified May 22, 2025, 05:42 AM UTC
Throughput has improved although behavior of individual partitions remains a problem and is still causing delays in some cases.
- identified May 22, 2025, 05:43 PM UTC
We are not to full resolution yet.
- monitoring May 22, 2025, 09:05 PM UTC
Zookeeper corruption has been rooted out. Things appear healthier and catching up in all cases.
- resolved May 23, 2025, 02:27 AM UTC
This incident has been resolved.