Splunk Observability Cloud US2 incident

Charts and Detectors Delayed

Major Resolved View vendor source →

Splunk Observability Cloud US2 experienced a major incident on October 11, 2024 affecting Splunk Observability Cloud Web Interface and Alerting, lasting 1d 2h. The incident has been resolved; the full update timeline is below.

Started
Oct 11, 2024, 01:23 PM UTC
Resolved
Oct 12, 2024, 03:28 PM UTC
Duration
1d 2h
Detected by Pingoru
Oct 11, 2024, 01:23 PM UTC

Affected components

Splunk Observability Cloud Web InterfaceAlerting

Update timeline

  1. investigating Oct 11, 2024, 01:23 PM UTC

    Customers may be experiencing delays in some charts and detectors. Datapoint ingest is not affected. We are investigating and will provide an update shortly.

  2. identified Oct 11, 2024, 02:11 PM UTC

    Investigation has confirmed that charts and detectors that rely on property and tag updates on metric time series are impacted. The issue has been identified and a fix is being implemented.

  3. identified Oct 11, 2024, 03:22 PM UTC

    We are continuing to work on a fix for this issue.

  4. identified Oct 11, 2024, 05:12 PM UTC

    We are continuing to make progress on a fix for this issue.

  5. identified Oct 11, 2024, 06:11 PM UTC

    We are continuing to make progress on a fix for this issue.

  6. identified Oct 11, 2024, 07:30 PM UTC

    We are continuing to make progress on a fix for this issue.

  7. identified Oct 11, 2024, 09:36 PM UTC

    The fix has been implemented and is in the process of being deployed.

  8. identified Oct 11, 2024, 11:58 PM UTC

    We are continuing to deploy the fix and will provide further updates as it starts taking effect.

  9. identified Oct 12, 2024, 02:09 AM UTC

    The fix has been deployed. We are monitoring it and will continue providing updates.

  10. identified Oct 12, 2024, 04:16 AM UTC

    We are in the process of implementing additional fixes and will continue to provide updates

  11. identified Oct 12, 2024, 06:12 AM UTC

    Additional fixes are now implemented and are in the process of being deployed. We will continue to provide updates.

  12. identified Oct 12, 2024, 09:02 AM UTC

    Additional fixes are now deployed and starting to take effect. We will continue to monitor and provide updates

  13. identified Oct 12, 2024, 11:06 AM UTC

    The fixes continue to take effect and we expect the system to recover at a steady pace over the next few hours. We will continue to monitor and provide updates

  14. identified Oct 12, 2024, 01:47 PM UTC

    The recovery is ongoing at a steady pace. We will continue to monitor and provide updates

  15. monitoring Oct 12, 2024, 02:58 PM UTC

    The system has now recovered, and we are continuing to monitor.

  16. resolved Oct 12, 2024, 03:28 PM UTC

    This incident has been resolved.