Datadog Integration incident
Google Cloud metrics latency/failures
Datadog Integration experienced a major incident on August 14, 2024 affecting Google Cloud Platform Google Cloud Pub/Sub and Google Cloud Platform Cloud Monitoring API, lasting 5h 32m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 14, 2024, 03:57 PM UTC
We're actively investigating increased latencies and failures for collecting Google Cloud metrics. As an effect, there might be delays in graphs displaying these metrics. To avoid spurious alerts, we’ve temporarily disabled alerts for Google Cloud metrics.
- identified Aug 14, 2024, 03:59 PM UTC
The issue has been identified, reaching out to Google Cloud for support.
- identified Aug 14, 2024, 04:01 PM UTC
We are continuing to work on a fix for this issue.
- identified Aug 14, 2024, 04:42 PM UTC
Google has confirmed the issue with the Cloud Monitoring API and are actively looking into a fix.
- monitoring Aug 14, 2024, 07:53 PM UTC
Google has marked this issue resolved as of 3:42PM EST. We're seeing improvements in gathering metrics but will continue to monitor as processes recover.
- resolved Aug 14, 2024, 09:29 PM UTC
All monitors show metrics gathering has resumed successfully. Marking incident as resolved.