Balena incident

Elevated API Errors

Major Resolved View vendor source →

Balena experienced a major incident on January 28, 2026 affecting API and Application Builder, lasting 3h 12m. The incident has been resolved; the full update timeline is below.

Started
Jan 28, 2026, 06:54 PM UTC
Resolved
Jan 28, 2026, 10:06 PM UTC
Duration
3h 12m
Detected by Pingoru
Jan 28, 2026, 06:54 PM UTC

Affected components

APIApplication Builder

Update timeline

  1. investigating Jan 28, 2026, 06:54 PM UTC

    We're experiencing an elevated level of API errors and are currently looking into the issue.

  2. identified Jan 28, 2026, 07:45 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Jan 28, 2026, 08:08 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Jan 28, 2026, 10:06 PM UTC

    This incident has been resolved.

  5. postmortem Jan 29, 2026, 09:48 PM UTC

    ### What Happened On January 28, our API experienced degraded performance due to a cascading pod failure triggered by infrastructure inconsistency. ### Root Cause We unexpectedly had one node running on an older generation CPU, which caused uneven load distribution. When that node's pod failed a health check under normal traffic spikes, it triggered a cascading failure across other pods \(a "thundering herd" effect\). ### Resolution We resolved the incident by: * Scaling up the number of API pods to restore capacity during recovery * Removing the infrastructure inconsistency that caused the initial failure * Adjusting health check configurations to prevent similar cascading failures We apologize for any disruption this caused and appreciate your patience as we continue improving our platform's resilience.