Datadog Integration incident

Google Cloud metrics latency/failures

Major Resolved View vendor source →

Datadog Integration experienced a major incident on August 14, 2024 affecting Google Cloud Platform Google Cloud Pub/Sub and Google Cloud Platform Cloud Monitoring API, lasting 5h 32m. The incident has been resolved; the full update timeline is below.

Started
Aug 14, 2024, 03:57 PM UTC
Resolved
Aug 14, 2024, 09:29 PM UTC
Duration
5h 32m
Detected by Pingoru
Aug 14, 2024, 03:57 PM UTC

Affected components

Google Cloud Platform Google Cloud Pub/SubGoogle Cloud Platform Cloud Monitoring API

Update timeline

  1. investigating Aug 14, 2024, 03:57 PM UTC

    We're actively investigating increased latencies and failures for collecting Google Cloud metrics. As an effect, there might be delays in graphs displaying these metrics. To avoid spurious alerts, we’ve temporarily disabled alerts for Google Cloud metrics.

  2. identified Aug 14, 2024, 03:59 PM UTC

    The issue has been identified, reaching out to Google Cloud for support.

  3. identified Aug 14, 2024, 04:01 PM UTC

    We are continuing to work on a fix for this issue.

  4. identified Aug 14, 2024, 04:42 PM UTC

    Google has confirmed the issue with the Cloud Monitoring API and are actively looking into a fix.

  5. monitoring Aug 14, 2024, 07:53 PM UTC

    Google has marked this issue resolved as of 3:42PM EST. We're seeing improvements in gathering metrics but will continue to monitor as processes recover.

  6. resolved Aug 14, 2024, 09:29 PM UTC

    All monitors show metrics gathering has resumed successfully. Marking incident as resolved.