Datadog Govcloud incident

Delayed Distribution Metrics

Major Resolved View vendor source →

Datadog Govcloud experienced a major incident on March 8, 2024 affecting Metrics and Infra Monitoring, lasting 3h 33m. The incident has been resolved; the full update timeline is below.

Started
Mar 08, 2024, 07:50 PM UTC
Resolved
Mar 08, 2024, 11:24 PM UTC
Duration
3h 33m
Detected by Pingoru
Mar 08, 2024, 07:50 PM UTC

Affected components

Metrics and Infra Monitoring

Update timeline

  1. investigating Mar 08, 2024, 07:50 PM UTC

    We are investigating increased latency processing Distribution Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

  2. identified Mar 08, 2024, 08:44 PM UTC

    The issue has been identified and a fix is being implemented.

  3. identified Mar 08, 2024, 09:46 PM UTC

    Remediation for this issue is still in progress. Monitors are no longer affected, and live data are no longer delayed, but some users might continue to see gaps in historical data on metrics and dashboards until remediation is complete.

  4. resolved Mar 08, 2024, 11:24 PM UTC

    Monitors and live data are no longer affected and all live data is shown without delays. Unfortunately we are not able to recover data for distribution metrics with percentiles disabled from 19:00-19:40 UTC and those metrics will display gaps. You can see your affected metrics here: https://app.ddog-gov.com/metric/summary?facet.percentiles=-enabled&filter=dist.