insightsoftware experienced a notice incident on May 16, 2025 affecting Azure OpenAI, lasting —. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- resolved May 16, 2025, 04:33 AM UTC
https://app.azure.com/h/4TSS-3VG/6711e7 What Happened Between 01:33 UTC and 02:38 UTC on 16 May 2025, a platform-level issue within the Azure OpenAI Service led to service interruptions affecting downstream integrations. During this period, our services that rely on Azure OpenAI experienced intermittent errors and degraded performance, including HTTP 500 responses. Root Cause Microsoft has identified that the underlying cause was an increase in total request size on their backend, which triggered container restarts due to memory limits being exceeded. This resulted in one of their backend resources becoming unhealthy and unable to process requests reliably. Timeline of Response 01:33 UTC – Azure platform issue began impacting dependent services 01:49 UTC – Issue was automatically detected by Microsoft’s monitoring systems; Azure engineers were engaged 01:58 UTC – Root cause was isolated to memory-related container restarts within Azure OpenAI infrastructure 02:12 UTC – Microsoft engineers initiated mitigation by scaling out the affected backend systems 02:38 UTC – Full service restoration confirmed by Microsoft Impact on Our Services While our infrastructure and application code remained stable, availability of AI-powered features was temporarily degraded due to upstream dependency on Azure OpenAI. No data loss or permanent service degradation occurred within our environment.