Datadog AP1 incident
Elevated Error Rates for Log Queries and Monitors
Datadog AP1 experienced a major incident on October 3, 2023 affecting Log Management and Monitors, lasting 20h 59m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 03, 2023, 05:33 PM UTC
We are actively investigating issues with Log Queries returning unexpected results. As a result of this issue, some users may experience issues querying logs on the web application or API, and with Logs based Monitors and Log-Based Metrics.
- investigating Oct 03, 2023, 06:50 PM UTC
We are continuing to investigate these issues, and will provide an update as soon as possible.
- identified Oct 03, 2023, 07:33 PM UTC
We have identified the underlying issue and are working on a fix.
- monitoring Oct 03, 2023, 08:49 PM UTC
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved. At this time, newly ingested data is properly queryable, and monitors targeting Logs sent from 2023-10-03 20:40 UTC onwards are valid. Queries targeting logs between 2023-10-02 11:40 UTC and 2023-10-03 20:40 UTC may return erroneous data. We are evaluating a fix that will restore query correctness for this time-window.
- monitoring Oct 04, 2023, 09:20 AM UTC
We're still working on a fix for historical data impacted by this incident.
- monitoring Oct 04, 2023, 10:26 AM UTC
We're still working on a fix for historical data impacted by this incident.
- monitoring Oct 04, 2023, 11:06 AM UTC
We're still working on a fix for historical data impacted by this incident.
- monitoring Oct 04, 2023, 11:41 AM UTC
We're still working on a fix for historical data impacted by this incident.
- monitoring Oct 04, 2023, 12:19 PM UTC
We're still working on a fix for historical data impacted by this incident.
- monitoring Oct 04, 2023, 01:05 PM UTC
We have successfully tested a fix for this issue and are currently deploying it to resolve this incident.
- monitoring Oct 04, 2023, 01:09 PM UTC
Fix has been rolled out and we are currently monitoring to confirm full resolution.
- resolved Oct 04, 2023, 02:33 PM UTC
This incident has been resolved.