Scalyr incident
Graph, alert and dashboard irregularities for some US customers
Scalyr experienced a minor incident on August 29, 2022 affecting Main Site, lasting 1d. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 29, 2022, 01:47 PM UTC
Some customers in the US cluster are reporting unexpected behavior including false alarms, inconsistent graphs, and missing search results. We are currently investigating the issue.
- identified Aug 29, 2022, 05:40 PM UTC
We have identified the issue in our timeseries database and are working on remediation.
- identified Aug 29, 2022, 09:38 PM UTC
At 15:00 PDT (22:00 Universal Coordinated Time) we will be restarting our summary service, which powers our alerts and speeds up dashboard rendering. The summary service will be unavailable for approximately 10 minutes, after which it will begin rebuilding time series data for all alerts. Each alert will not be evaluated until its time series is rebuilt, so alerts with longer look back periods will be the last to successfully be evaluated. Loading a dashboard will trigger the time series on that dashboard to be recreated, so that dashboard will initially load more slowly at first.
- monitoring Aug 29, 2022, 11:24 PM UTC
The summary service has been restarted and we are beginning to rebuild time series data for all alerts. Once the time series for an alert has been recreated, we will begin evaluating it again. Alerts with longer look back periods will be the last to successfully be evaluated. Loading a dashboard will trigger the time series on that dashboard to be recreated, so that dashboard will initially load more slowly at first. We are continuing to monitor this process.
- resolved Aug 30, 2022, 02:16 PM UTC
A majority of the time series have been re-built with Dashboard performance having been restored for most customers and Alert evaluation success rates at pre-incident levels. We're marking the issue resolved and expect 100% return to pre-incident levels in the next 24 hours.