Datadog Govcloud incident

Web Application Not Loading

Critical Resolved View vendor source →

Datadog Govcloud experienced a critical incident on November 2, 2021 affecting APM and Log Management and 1 more component, lasting 3h 16m. The incident has been resolved; the full update timeline is below.

Started
Nov 02, 2021, 05:43 PM UTC
Resolved
Nov 02, 2021, 09:00 PM UTC
Duration
3h 16m
Detected by Pingoru
Nov 02, 2021, 05:43 PM UTC

Affected components

APMLog ManagementMetrics and Infra MonitoringMonitorsWeb Application

Update timeline

  1. investigating Nov 02, 2021, 05:43 PM UTC

    We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application.

  2. identified Nov 02, 2021, 06:14 PM UTC

    We are continuing to investigate this issue with our provider. Network connectivity issues in the region are still causing issues loading the application, delaying data and alerts in the Datadog Govcloud region.

  3. identified Nov 02, 2021, 06:50 PM UTC

    Network connectivity issues in the region are still causing issues loading the application, delaying data and alerts in the Datadog Govcloud region. Our provider has acknowledged the issue and is working to resolve it.

  4. identified Nov 02, 2021, 07:28 PM UTC

    Network connectivity issues in the region continuing to cause issues loading the application, delaying data and alerts in the Datadog Govcloud region. Our provider has acknowledged the issue and is working to resolve it. We have also begun our own mitigations.

  5. identified Nov 02, 2021, 07:50 PM UTC

    Our provider has resolved the underlying network issue. We are now scaling up our systems to handle the backlog.

  6. resolved Nov 02, 2021, 09:00 PM UTC

    We are now recovered for live data and monitors. At this point, customers might still be seeing gaps in metrics data between 1:25 and 3:25 EDT, we will be following up with specific affected customers through our usual support channels.