Datadog US3 experienced a major incident on May 2, 2025 affecting Metrics and Infra Monitoring and Monitors, lasting 27m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 02, 2025, 01:37 PM UTC
We’re investigating increased metric latencies. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metric Monitors.
- investigating May 02, 2025, 02:17 PM UTC
We are continuing to investigate this issue.
- monitoring May 02, 2025, 02:20 PM UTC
For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.
- identified May 02, 2025, 02:21 PM UTC
For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.
- identified May 02, 2025, 03:14 PM UTC
We are continuing to work on a fix for this issue.
- investigating May 02, 2025, 03:29 PM UTC
All metric monitor notifications have been delayed starting at 14:57 UTC. We are working on identifying the issue.
- identified May 02, 2025, 03:57 PM UTC
For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data between 11:25 - 13:00 UTC time range are delayed. Metric queries and metrics monitors evaluating data after 13:00 UTC are correct and working as expected.
- identified May 02, 2025, 05:03 PM UTC
We are continuing to work on a fix for this issue.
- monitoring May 02, 2025, 05:38 PM UTC
We have identified the issues, and are backfilling data. Monitors with an alert window of one hour or less have been restored, and live metrics data is available
- monitoring May 02, 2025, 07:00 PM UTC
All metrics data during the impacted window is available. We will being re-enabling monitors with an evaluation window greater than 60 minutes. Monitors with an evaluation window of less than 60 minutes continue to be evaluated.
- resolved May 02, 2025, 07:28 PM UTC
This incident has been resolved.