Scout APM incident

Ingestion Lag

Major Resolved View vendor source →

Scout APM experienced a major incident on September 18, 2019 affecting Application Monitoring, lasting 14h 28m. The incident has been resolved; the full update timeline is below.

Started
Sep 18, 2019, 11:31 AM UTC
Resolved
Sep 19, 2019, 02:00 AM UTC
Duration
14h 28m
Detected by Pingoru
Sep 18, 2019, 11:31 AM UTC

Affected components

Application Monitoring

Update timeline

  1. monitoring Sep 18, 2019, 11:31 AM UTC

    We are experiencing some ingestion lag. We have identified the issue and we are working on processing the backlog. Your charts will continue to catch up as we process the backlog.

  2. monitoring Sep 18, 2019, 09:25 PM UTC

    We are recreating some databases indexes which has forced us to fully pause ingestion. Once the indexes are rebuilt metrics will fill in to current while we continue to fix the root cause of the ingestion lag.

  3. resolved Sep 19, 2019, 02:00 AM UTC

    Metric ingestion was paused at 20:13, restarted at 22:30 UTC and all apps metrics are caught up and stable as of 2019-09-18 0:00UTC. Operations are back to normal.