Datadog US3 incident

Data delays and web errors

Major Resolved View vendor source →

Datadog US3 experienced a major incident on November 2, 2023 affecting Metrics and Infra Monitoring and Monitors, lasting 30m. The incident has been resolved; the full update timeline is below.

Started
Nov 02, 2023, 10:55 AM UTC
Resolved
Nov 02, 2023, 11:25 AM UTC
Duration
30m
Detected by Pingoru
Nov 02, 2023, 10:55 AM UTC

Affected components

Metrics and Infra MonitoringMonitors

Update timeline

  1. identified Nov 02, 2023, 10:55 AM UTC

    We have identified an issue which caused temporarily elevated error rates on our web application (500 error pages) and increased latency processing monitoring data. As a result of this issue, customers might still some metrics are delayed as well as monitors relying on this data. We are currently working on a fix and the data is being backfilled. We will provide another update once the service is fully operational again.

  2. identified Nov 02, 2023, 11:02 AM UTC

    We are continuing to work on a fix for this issue.

  3. monitoring Nov 02, 2023, 11:12 AM UTC

    All components have recovered now. We are replaying a few failed notifications.

  4. resolved Nov 02, 2023, 11:25 AM UTC

    This incident has been resolved.