Scalyr incident

Ingestion delay in app.scalyr.com

Notice Resolved View vendor source →

Scalyr experienced a notice incident on October 24, 2024, lasting 6h 54m. The incident has been resolved; the full update timeline is below.

Started
Oct 24, 2024, 04:54 PM UTC
Resolved
Oct 24, 2024, 11:49 PM UTC
Duration
6h 54m
Detected by Pingoru
Oct 24, 2024, 04:54 PM UTC

Update timeline

  1. investigating Oct 24, 2024, 04:54 PM UTC

    We are currently investigating the issue.

  2. identified Oct 24, 2024, 05:13 PM UTC

    A misconfiguration deployed this morning prevented the servers from scaling up correctly. We are currently in the process of manually scaling up the servers to manage the ingestion volume effectively.

  3. identified Oct 24, 2024, 06:11 PM UTC

    The aggressive scaling out of servers led to a 500 error when loading the page due to hitting the database connection limit. We are currently in the process of scaling the servers back in, and the error should be resolved shortly.

  4. monitoring Oct 24, 2024, 07:11 PM UTC

    The UI should now be loading as expected. We have increased the database connection limit to accommodate more concurrent connections. The queue is in the process of recovering and is gradually processing the backlog of events.

  5. resolved Oct 24, 2024, 11:49 PM UTC

    This incident has been resolved.