Datadog US3 incident

Metrics delayed

Major Resolved View vendor source →

Datadog US3 experienced a major incident on May 2, 2025 affecting Metrics and Infra Monitoring and Monitors, lasting 27m. The incident has been resolved; the full update timeline is below.

Started
May 02, 2025, 07:00 PM UTC
Resolved
May 02, 2025, 07:28 PM UTC
Duration
27m
Detected by Pingoru
May 02, 2025, 07:00 PM UTC

Affected components

Metrics and Infra MonitoringMonitors

Update timeline

  1. investigating May 02, 2025, 01:37 PM UTC

    We’re investigating increased metric latencies. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metric Monitors.

  2. investigating May 02, 2025, 02:17 PM UTC

    We are continuing to investigate this issue.

  3. monitoring May 02, 2025, 02:20 PM UTC

    For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.

  4. identified May 02, 2025, 02:21 PM UTC

    For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.

  5. identified May 02, 2025, 03:14 PM UTC

    We are continuing to work on a fix for this issue.

  6. investigating May 02, 2025, 03:29 PM UTC

    All metric monitor notifications have been delayed starting at 14:57 UTC. We are working on identifying the issue.

  7. identified May 02, 2025, 03:57 PM UTC

    For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data between 11:25 - 13:00 UTC time range are delayed. Metric queries and metrics monitors evaluating data after 13:00 UTC are correct and working as expected.

  8. identified May 02, 2025, 05:03 PM UTC

    We are continuing to work on a fix for this issue.

  9. monitoring May 02, 2025, 05:38 PM UTC

    We have identified the issues, and are backfilling data. Monitors with an alert window of one hour or less have been restored, and live metrics data is available

  10. monitoring May 02, 2025, 07:00 PM UTC

    All metrics data during the impacted window is available. We will being re-enabling monitors with an evaluation window greater than 60 minutes. Monitors with an evaluation window of less than 60 minutes continue to be evaluated.

  11. resolved May 02, 2025, 07:28 PM UTC

    This incident has been resolved.