Discount Ninja experienced a major incident on January 25, 2023 affecting Checkout API and Admin (App) and 1 more component, lasting 2h 56m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 25, 2023, 08:04 AM UTC
We are currently investigating the issue.
- identified Jan 25, 2023, 08:48 AM UTC
Starting at 07:30 UTC, we're aware of a networking issue impacting connectivity to Discount Ninja's infrastructure hosted on Microsoft Azure for a subset of users. We are actively investigating and will share updates as soon as more is known.
- identified Jan 25, 2023, 09:36 AM UTC
The issue can be tracked here: https://azure.status.microsoft/en-us/status. Latest update (09:24 UTC) from Microsoft: "We've determined the network connectivity issue is occurring with devices across the Microsoft Wide Area Network (WAN). This impacts connectivity between clients on the internet to Azure, as well as connectivity between services in datacenters, as well as ExpressRoute connections. The issue is causing impact in waves, peaking approximately every 30 minutes. We are actively investigating and will share updates as soon as more is known."
- monitoring Jan 25, 2023, 10:09 AM UTC
Microsoft has identified a recent change to WAN as the underlying cause, and has taken steps to roll back this change. Their telemetry shows consistent signs of recovery from 09:00 UTC onwards across multiple regions and services, and they are continuing to actively monitor the situation. With WAN networking now seeing recovery, Microsoft is working to ensure full recovery for impacted services.
- resolved Jan 25, 2023, 11:01 AM UTC
Summary of Impact: Between 07:05 UTC and 09:45 UTC on 25 January 2023, users experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Discount Ninja APIs hosted on Microsoft Azure. Preliminary Root Cause: Microsoft determined that a change made to the Microsoft Wide Area Network (WAN) impacted connectivity between users on the internet to Azure, as well as connectivity between services in different regions, as well as ExpressRoute connections. Mitigation: Microsoft identified a recent change to WAN as the underlying cause and have rolled back this change. Networking telemetry shows recovery from 09:00 UTC onwards across all regions and services, with the final networking equipment recovering at 09:35 UTC. Most impacted Microsoft services automatically recovered once network connectivity was restored, and we worked to recover the remaining impacted services.