Scalyr Outage History

Scalyr is up right now

Scalyr had 12 outages in the last 2 years totaling 29h 38m of downtime — averaging 0.5 incidents per month.

There were 12 Scalyr outages since July 12, 2024 totaling 29h 38m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.scalyr.com

Notice May 8, 2026

Drop in ingestion volume for users on app.scalyr.com

Detected by Pingoru
May 08, 2026, 10:31 PM UTC
Resolved
May 09, 2026, 01:30 AM UTC
Duration
2h 58m
Timeline · 4 updates
  1. investigating May 08, 2026, 10:31 PM UTC

    We are currently investigating this issue.

  2. identified May 08, 2026, 10:50 PM UTC

    We have identified an issue with a backend cluster that is currently disrupting event ingestion. Our engineering team is performing a system failover to restore normal operations.

  3. monitoring May 09, 2026, 12:07 AM UTC

    Rebalancing our ingestion servers has stabilized the incoming data volume. Engineering is adding capacity to accelerate the processing of the data backlog. The impacted cluster has resumed ingestion at normal levels. All affected data has been queued since the beginning of the incident, so there is no data loss.

  4. resolved May 09, 2026, 01:30 AM UTC

    This incident has been resolved.

Read the full incident report →

Notice April 23, 2026

SSO Login Failure for Accounts in app.eu.scalyr.com, app.us1.dataset.com, and app.eu1.dataset.com

Detected by Pingoru
Apr 23, 2026, 07:45 PM UTC
Resolved
Apr 23, 2026, 07:47 PM UTC
Duration
1m
Timeline · 2 updates
  1. monitoring Apr 23, 2026, 07:45 PM UTC

    Apologies for the correction. To clarify the scope of the incident mentioned in our previous post, the following environments were impacted: app.eu.scalyr.com, app.us1.dataset.com, and app.eu1.dataset.com.

  2. resolved Apr 23, 2026, 07:47 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice April 23, 2026

SSO Login Failure for Accounts in app.eu.scalyr.com, app.dataset.com, and app.eu.dataset.com

Detected by Pingoru
Apr 23, 2026, 07:34 PM UTC
Resolved
Apr 23, 2026, 04:00 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 23, 2026, 07:34 PM UTC

    Between 4:05 PM and 7:16 PM UTC, some DataSet customers may have experienced intermittent issues logging into the UI via SSO. This incident was limited strictly to the login interface; data ingestion and query capabilities were not impacted and remained fully operational throughout the event. Our engineering team has successfully mitigated the issue via a system rollback, and all SSO login services have been fully restored.

Read the full incident report →

Notice September 9, 2025

[NA1] Dataset Service Interruption with data ingestion and search functionality

Detected by Pingoru
Sep 09, 2025, 05:03 PM UTC
Resolved
Sep 09, 2025, 08:29 PM UTC
Duration
3h 25m
Timeline · 3 updates
  1. investigating Sep 09, 2025, 05:03 PM UTC

    We are currently investigating an issue affecting both search and ingestion functionality on app.scalyr.com.

  2. identified Sep 09, 2025, 06:02 PM UTC

    The issue has been identified and our Engineering teams are working to implement a fix. We will continue to provide updates on our progress.

  3. resolved Sep 09, 2025, 08:29 PM UTC

    All services have been fully restored and the incident is now resolved. We have validated that all systems are functioning normally. Thank you for your patience throughout this incident.

Read the full incident report →

Notice January 22, 2025

Ingestion delay on app.us1.dataset.com and app.eu1.dataset.com

Detected by Pingoru
Jan 22, 2025, 12:00 AM UTC
Resolved
Jan 22, 2025, 02:14 AM UTC
Duration
2h 14m
Timeline · 2 updates
  1. identified Jan 22, 2025, 12:00 AM UTC

    Ingestions were temporarily interrupted at approximately 8:30 PM GMT due to an authorization issue on the cluster. Recovery is currently underway, and users can expect the events to be backfilled slowly to their accounts

  2. resolved Jan 22, 2025, 02:14 AM UTC

    This incident has been resolved.

Read the full incident report →

Notice January 16, 2025

ingestions and queries are affected on app.scalyr.com, app.eu.scalyr.com, app.dataset.com, and app.eu.dataset.com

Detected by Pingoru
Jan 16, 2025, 06:39 PM UTC
Resolved
Jan 17, 2025, 12:22 AM UTC
Duration
5h 42m
Timeline · 5 updates
  1. investigating Jan 16, 2025, 06:39 PM UTC

    We are currently investigating this issue.

  2. investigating Jan 16, 2025, 07:10 PM UTC

    We are in the process of rolling back the change and restarting the affected servers. Some accounts may observe a gradual recovery of their logs.

  3. identified Jan 16, 2025, 07:48 PM UTC

    A misconfiguration deployment caused server resource allocation issues. A rollback and server restart have been initiated to address the problem.

  4. monitoring Jan 16, 2025, 10:01 PM UTC

    Recovery is currently in progress. The estimated time for full ingestion restoration is approximately 60 to 90 minutes from now.

  5. resolved Jan 17, 2025, 12:22 AM UTC

    The backlog data caused by the incident has been fully recovered in all regions

Read the full incident report →

Notice October 24, 2024

Ingestion delay in app.scalyr.com

Detected by Pingoru
Oct 24, 2024, 04:54 PM UTC
Resolved
Oct 24, 2024, 11:49 PM UTC
Duration
6h 54m
Timeline · 5 updates
  1. investigating Oct 24, 2024, 04:54 PM UTC

    We are currently investigating the issue.

  2. identified Oct 24, 2024, 05:13 PM UTC

    A misconfiguration deployed this morning prevented the servers from scaling up correctly. We are currently in the process of manually scaling up the servers to manage the ingestion volume effectively.

  3. identified Oct 24, 2024, 06:11 PM UTC

    The aggressive scaling out of servers led to a 500 error when loading the page due to hitting the database connection limit. We are currently in the process of scaling the servers back in, and the error should be resolved shortly.

  4. monitoring Oct 24, 2024, 07:11 PM UTC

    The UI should now be loading as expected. We have increased the database connection limit to accommodate more concurrent connections. The queue is in the process of recovering and is gradually processing the backlog of events.

  5. resolved Oct 24, 2024, 11:49 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice September 26, 2024

Ingestion failures on app.us1.dataset.com

Detected by Pingoru
Sep 26, 2024, 07:36 PM UTC
Resolved
Sep 26, 2024, 11:38 PM UTC
Duration
4h 2m
Timeline · 2 updates
  1. identified Sep 26, 2024, 07:36 PM UTC

    Ingestion on app.us1.dataset.com returns 429 and 500 status codes since 2 pm GMT. A significant increase in ingestion has overloaded the cluster. The issue has been identified and we're working on a fix.

  2. resolved Sep 26, 2024, 11:38 PM UTC

    The ingestion pipeline and supporting systems are fully functional. While there is no suspected data loss it may take up to 24 hours to work through the backlog of data that has built up during the incident to be fully processed.

Read the full incident report →

Notice September 20, 2024

Ingestion delay in app.eu.scalyr.com

Detected by Pingoru
Sep 20, 2024, 07:21 PM UTC
Resolved
Sep 20, 2024, 08:33 PM UTC
Duration
1h 12m
Timeline · 3 updates
  1. investigating Sep 20, 2024, 07:21 PM UTC

    We are currently investigating this issue.

  2. identified Sep 20, 2024, 07:34 PM UTC

    The issue has been identified and a fix is being implemented.

  3. resolved Sep 20, 2024, 08:33 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice September 10, 2024

Query timeouts and slowness on app.scalyr.com and app.dataset.com

Detected by Pingoru
Sep 10, 2024, 05:41 PM UTC
Resolved
Sep 10, 2024, 06:23 PM UTC
Duration
42m
Timeline · 3 updates
  1. identified Sep 10, 2024, 05:41 PM UTC

    The correct build of the query servers failed to initialize, leading to performance issues with query execution. The root cause has been identified and the problem has been mitigated for users on app.scalyr.com. The recovery is still in progress for users on app.dataset.com

  2. monitoring Sep 10, 2024, 06:12 PM UTC

    All query servers were restarted and query performance is back to normal

  3. resolved Sep 10, 2024, 06:23 PM UTC

    This incident has been resolved.

Read the full incident report →

Notice August 19, 2024

app.eu.scalyr.com is getting spike of ingestion errors

Detected by Pingoru
Aug 19, 2024, 02:05 AM UTC
Resolved
Aug 19, 2024, 03:48 AM UTC
Duration
1h 42m
Timeline · 3 updates
  1. investigating Aug 19, 2024, 02:05 AM UTC

    We are currently investigating this issue.

  2. identified Aug 19, 2024, 02:45 AM UTC

    The issue has been identified and a fix is being implemented.

  3. resolved Aug 19, 2024, 03:48 AM UTC

    Ingestion traffic is now back to normal. The issue has been mitigated

Read the full incident report →

Notice July 12, 2024

Queries are unavailable on app.eu.scalyr.com

Detected by Pingoru
Jul 12, 2024, 06:57 PM UTC
Resolved
Jul 12, 2024, 07:39 PM UTC
Duration
41m
Timeline · 3 updates
  1. investigating Jul 12, 2024, 06:57 PM UTC

    The issue is currently under investigation

  2. monitoring Jul 12, 2024, 07:15 PM UTC

    Queries are now functional on app.eu.scalyr.com after restarting the machines with the OOM issue.

  3. resolved Jul 12, 2024, 07:39 PM UTC

    This incident has been resolved.

Read the full incident report →