insightsoftware incident

Microsoft Azure OpenAI outage

Notice Resolved View vendor source →

insightsoftware experienced a notice incident on May 16, 2025 affecting Azure OpenAI, lasting —. The incident has been resolved; the full update timeline is below.

Started
May 16, 2025, 04:33 AM UTC
Resolved
May 16, 2025, 04:33 AM UTC
Duration
Detected by Pingoru
May 16, 2025, 04:33 AM UTC

Affected components

Azure OpenAI

Update timeline

  1. resolved May 16, 2025, 04:33 AM UTC

    https://app.azure.com/h/4TSS-3VG/6711e7 What Happened Between 01:33 UTC and 02:38 UTC on 16 May 2025, a platform-level issue within the Azure OpenAI Service led to service interruptions affecting downstream integrations. During this period, our services that rely on Azure OpenAI experienced intermittent errors and degraded performance, including HTTP 500 responses. Root Cause Microsoft has identified that the underlying cause was an increase in total request size on their backend, which triggered container restarts due to memory limits being exceeded. This resulted in one of their backend resources becoming unhealthy and unable to process requests reliably. Timeline of Response 01:33 UTC – Azure platform issue began impacting dependent services 01:49 UTC – Issue was automatically detected by Microsoft’s monitoring systems; Azure engineers were engaged 01:58 UTC – Root cause was isolated to memory-related container restarts within Azure OpenAI infrastructure 02:12 UTC – Microsoft engineers initiated mitigation by scaling out the affected backend systems 02:38 UTC – Full service restoration confirmed by Microsoft Impact on Our Services While our infrastructure and application code remained stable, availability of AI-powered features was temporarily degraded due to upstream dependency on Azure OpenAI. No data loss or permanent service degradation occurred within our environment.