Scout APM incident

Ingest delays

Major Resolved View vendor source →

Scout APM experienced a major incident on May 21, 2025 affecting Application Monitoring, lasting 1d 12h. The incident has been resolved; the full update timeline is below.

Started
May 21, 2025, 01:55 PM UTC
Resolved
May 23, 2025, 02:27 AM UTC
Duration
1d 12h
Detected by Pingoru
May 21, 2025, 01:55 PM UTC

Affected components

Application Monitoring

Update timeline

  1. investigating May 21, 2025, 01:55 PM UTC

    We are currently investigating this issue.

  2. monitoring May 21, 2025, 02:33 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. identified May 21, 2025, 03:08 PM UTC

    The initial fix was unsuccessful, certain accounts are now substantially delayed.

  4. monitoring May 21, 2025, 04:13 PM UTC

    An alternate approach has been applied, we are watching.

  5. identified May 22, 2025, 02:13 AM UTC

    It has been a long day with kafka. We continue to experience instability, causing lag and dropped payloads.

  6. identified May 22, 2025, 05:42 AM UTC

    Throughput has improved although behavior of individual partitions remains a problem and is still causing delays in some cases.

  7. identified May 22, 2025, 05:43 PM UTC

    We are not to full resolution yet.

  8. monitoring May 22, 2025, 09:05 PM UTC

    Zookeeper corruption has been rooted out. Things appear healthier and catching up in all cases.

  9. resolved May 23, 2025, 02:27 AM UTC

    This incident has been resolved.