Balena experienced a major incident on January 28, 2026 affecting API and Application Builder, lasting 3h 12m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 28, 2026, 06:54 PM UTC
We're experiencing an elevated level of API errors and are currently looking into the issue.
- identified Jan 28, 2026, 07:45 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Jan 28, 2026, 08:08 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Jan 28, 2026, 10:06 PM UTC
This incident has been resolved.
- postmortem Jan 29, 2026, 09:48 PM UTC
### What Happened On January 28, our API experienced degraded performance due to a cascading pod failure triggered by infrastructure inconsistency. ### Root Cause We unexpectedly had one node running on an older generation CPU, which caused uneven load distribution. When that node's pod failed a health check under normal traffic spikes, it triggered a cascading failure across other pods \(a "thundering herd" effect\). ### Resolution We resolved the incident by: * Scaling up the number of API pods to restore capacity during recovery * Removing the infrastructure inconsistency that caused the initial failure * Adjusting health check configurations to prevent similar cascading failures We apologize for any disruption this caused and appreciate your patience as we continue improving our platform's resilience.