Datadog US3 incident

Issues with data ingesting and alerting

Critical Resolved View vendor source →

Datadog US3 experienced a critical incident on July 11, 2024 affecting APM and Application Vulnerability Management and 1 more component, lasting 5h 26m. The incident has been resolved; the full update timeline is below.

Started
Jul 11, 2024, 09:18 PM UTC
Resolved
Jul 12, 2024, 02:44 AM UTC
Duration
5h 26m
Detected by Pingoru
Jul 11, 2024, 09:18 PM UTC

Affected components

APMApplication Vulnerability ManagementCloud Network MonitoringCloud SIEMContinuous ProfilerLog ManagementMonitorsRUMSyntheticsWorkload Protection

Update timeline

  1. investigating Jul 11, 2024, 09:18 PM UTC

    We are investigating an issue with ingesting data which began around 20:40 UTC. As a result, data from Log Management, APM, Synthetics, Profiling, RUM, CSM, and NPM is delayed. Additionally, monitors derived from this data are delayed.

  2. monitoring Jul 11, 2024, 10:04 PM UTC

    We have deployed a fix and are monitoring the results. Certain data types (Logs, NPM, and Synthetics) are operational again, and alerting from those types has also resumed. APM, RUM, Profiling, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

  3. monitoring Jul 11, 2024, 10:38 PM UTC

    APM is operational at this time, and alerting based on APM data has also resumed. RUM, Profiling, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

  4. monitoring Jul 11, 2024, 11:45 PM UTC

    We continue to deploy fixes and are monitoring the results. We will provide another update once the issue is fully resolved.

  5. monitoring Jul 12, 2024, 12:26 AM UTC

    We continue to deploy fixes and are monitoring the results. RUM, Profiling, Cloud SIEM, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

  6. monitoring Jul 12, 2024, 01:09 AM UTC

    We continue to deploy fixes and are monitoring the results. We will provide another update once the issue is fully resolved.

  7. monitoring Jul 12, 2024, 01:42 AM UTC

    Remediation efforts continue. Profiling is operational again. RUM, Cloud SIEM, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

  8. monitoring Jul 12, 2024, 02:11 AM UTC

    Remediation efforts continue. RUM and Application Vulnerability Management are operational again. Cloud SIEM and Cloud Security Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

  9. resolved Jul 12, 2024, 02:44 AM UTC

    This incident has been resolved.