Decisions incident

Issues accessing Decisions in Teams

Major Resolved View vendor source →

Decisions experienced a major incident on July 30, 2024 affecting Decisions for Microsoft 365 and Decisions AI, lasting 4h 57m. The incident has been resolved; the full update timeline is below.

Started
Jul 30, 2024, 12:36 PM UTC
Resolved
Jul 30, 2024, 05:34 PM UTC
Duration
4h 57m
Detected by Pingoru
Jul 30, 2024, 12:36 PM UTC

Affected components

Decisions for Microsoft 365Decisions AI

Update timeline

  1. investigating Jul 30, 2024, 12:36 PM UTC

    We have received reports of limited issues accessing the Decisions application. We are currently investigating the issue.

  2. investigating Jul 30, 2024, 12:36 PM UTC

    We are continuing to investigate this issue.

  3. identified Jul 30, 2024, 01:03 PM UTC

    Possible causes have been identified with Microsoft Network Infrastructure and Microsoft is reporting issues with services in Europe. We are working to determine a fix and relevant service-return timeline.

  4. identified Jul 30, 2024, 01:13 PM UTC

    Issues with connecting to Decisions platform and Decisions AI services are due to issues with Microsoft Network Infrastructure (globally). We are working with Microsoft to find a solution and understand the service-return timeline from their side. Status for Microsoft services can be seen here: https://azure.status.microsoft/status

  5. resolved Jul 30, 2024, 05:34 PM UTC

    This incident has been resolved. Please refer to the status for Microsoft services for further information: https://azure.status.microsoft/status

  6. postmortem Aug 01, 2024, 12:57 PM UTC

    **Issue Summary** Between approximately at 11:45 UTC and 19:43 UTC on 30 July 2024, a subset of Microsoft customers, _**including Decisions**,_ experienced issues connecting to a subset of Microsoft services globally. Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and a subset of Microsoft 365 and Microsoft Purview services. **Root Cause Analysis** According to Microsoft, an unexpected usage spike resulted in Azure Front Door \(AFD\) and Azure Content Delivery Network \(CDN\) components performing below acceptable thresholds, leading to intermittent errors, timeout, and latency spikes. **Resolution** Microsoft implemented networking configuration changes to support DDoS protection efforts, and performed failovers to alternate networking paths to provide relief. The initial network configuration changes successfully mitigated majority of the impact by 14:10 UTC. **Preventive Measures** Microsoft is completing an internal retrospective to understand the incident in more detail and will publish a Preliminary Post Incident Review \(PIR\), which will be provided to Decisions. After Microsoft’s retrospective is completed, Decisions will determine what further steps can be taken to avoid similar downstream impacts in the future.