Sardine incident
System Instability across backend APIs and Dashboard
Sardine experienced a critical incident on November 29, 2025 affecting Device APIs and Issuing API and 1 more component, lasting 36m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 29, 2025, 06:43 PM UTC
We are currently investigating an outage affecting system availability. Further updates will be provided as they become available.
- monitoring Nov 29, 2025, 06:56 PM UTC
Service has been restored as of 18:45 UTC. We are closely monitoring system performance to ensure stability.
- resolved Nov 29, 2025, 07:20 PM UTC
The issue has been fully resolved, and all systems are operating normally.
- postmortem Dec 04, 2025, 05:21 PM UTC
# Post-Incident Report: Service Degradation \(November 28 & 29, 2025\) _\(All timestamps are in UTC\)_ ## Summary * **First Incident \(Nov 28\):** Elevated error rates and intermittent access were observed for approximately **5 minutes** \(06:43–06:48 UTC\). * **Second Incident \(Nov 29\):** **70-minute window** of elevated errors \(18:10–19:20 UTC\), including a severe service degradation period of roughly **22 minutes**. ## Root Cause This was caused by an unexpected and significant spike in traffic volume. The surge in requests temporarily exceeded our forecasted capacity and our auto scaling capability, causing congestion in our application layer.Impact ## Symptoms During these windows, customers using the Dashboard and Core Risk APIs experienced increased latency and 502/503/504 errors.Resolution and Next Steps ## Short Term Solution Our engineering teams intervened to stabilize the platform during the events. ## Long Term Solution To ensure our systems remain resilient against future traffic spikes of this magnitude, we are currently provisioning additional infrastructure and permanently increasing our system capacity.