Datadog Govcloud incident

Metrics Processing Delays

Critical Resolved View vendor source →

Datadog Govcloud experienced a critical incident on August 4, 2021 affecting APM and Log Management and 1 more component, lasting 3h 9m. The incident has been resolved; the full update timeline is below.

Started
Aug 04, 2021, 04:57 PM UTC
Resolved
Aug 04, 2021, 08:07 PM UTC
Duration
3h 9m
Detected by Pingoru
Aug 04, 2021, 04:57 PM UTC

Affected components

APMLog ManagementMetrics and Infra MonitoringMonitorsWeb Application

Update timeline

  1. investigating Aug 04, 2021, 04:57 PM UTC

    We’re investigating increased metric latencies and API errors. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metrics Monitors.

  2. identified Aug 04, 2021, 05:30 PM UTC

    We have identified a performance issue affecting Metrics, Logs, APM and alerting. We are working to restore access.

  3. identified Aug 04, 2021, 06:29 PM UTC

    The web application is responding without errors. We are still working to restore timely processing of tracing, logs, metrics and alerts.

  4. identified Aug 04, 2021, 07:24 PM UTC

    Realtime data should be available for all customers and we are actively working to restore delayed historical data. Most metric and logs based monitors are restored and we are working to restore the remainder shortly.

  5. identified Aug 04, 2021, 07:35 PM UTC

    All crawler based metrics are fully restored and current.

  6. monitoring Aug 04, 2021, 07:49 PM UTC

    All metrics based monitors are fully restored and we are continuing to restore log based monitors as we work through relevant backlogs.

  7. resolved Aug 04, 2021, 08:07 PM UTC

    This incident has been resolved.