- Detected by Pingoru
- May 05, 2026, 06:12 PM UTC
- Resolved
- May 05, 2026, 09:03 PM UTC
- Duration
- 2h 51m
Affected: APIApplication BuilderDelta Image Downloads
Timeline · 5 updates
-
investigating May 05, 2026, 06:12 PM UTC
We're experiencing an elevated level of API errors and are currently looking into the issue.
-
identified May 05, 2026, 06:19 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring May 05, 2026, 06:43 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved May 05, 2026, 09:03 PM UTC
This incident has been resolved.
-
postmortem May 05, 2026, 09:18 PM UTC
A vulnerability mitigation update required replacing compute cluster nodes, which, when applied, rolled back due to a timeout. This caused some workloads \(API, delta, builder\) to become temporarily unavailable and triggering some undesired secondary effects, including ungracefully terminating a few long lived instances servicing VPN connections. While the rest of the services came back reasonably quickly within a minute or so, it too around one and a quarter hours to re-establish VPN tunnels. A scheduled maintenance will be posted later to perform this update during a planned outage window.
Read the full incident report →
- Detected by Pingoru
- Mar 31, 2026, 12:55 PM UTC
- Resolved
- Apr 21, 2026, 04:30 PM UTC
- Duration
- 21d 3h
Affected: Application Builder
Timeline · 5 updates
-
identified Mar 31, 2026, 12:55 PM UTC
We're experiencing an elevated level of errors in our application builder infrastructure and are currently looking into the issue.
-
monitoring Apr 08, 2026, 07:51 PM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Apr 08, 2026, 07:51 PM UTC
We are continuing to monitor for any further issues.
-
resolved Apr 21, 2026, 04:30 PM UTC
This incident has been resolved.
-
postmortem Apr 21, 2026, 05:06 PM UTC
Starting around March 11, some cloud builds began failing intermittently with no such image errors. The failures were non-deterministic and affected all architectures. At peak, some users saw around 50% failure rates. We identified and fixed several bugs in the builder's image garbage collector that caused it to over-count freed disk space and run too aggressively, eventually deleting images that in-progress builds still needed. Fixes were deployed between March 19 and April 14, with build failure rates dropping to near-zero after the final deploy. We're continuing to monitor and working on additional safeguards to prevent the garbage collector from targeting images that active builds depend on.
Read the full incident report →
- Detected by Pingoru
- Mar 23, 2026, 04:57 PM UTC
- Resolved
- Mar 25, 2026, 01:18 PM UTC
- Duration
- 1d 20h
Affected: Application Builder
Timeline · 4 updates
-
investigating Mar 23, 2026, 04:57 PM UTC
We are seeing several builds intermitently failing with 404 errors - No such image during builds and are investigating.
-
monitoring Mar 23, 2026, 06:55 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 25, 2026, 01:18 PM UTC
This incident has been resolved.
-
postmortem Mar 25, 2026, 03:38 PM UTC
Between March 11 and March 25, some cloud builds experienced intermittent failures with "no such image" errors. The issue was non-deterministic and did not affect all builds. We've identified a likely contributing factor and deployed mitigations that have stabilized build reliability. We're continuing to investigate the underlying cause to prevent recurrence. If you experienced build failures during this window, re-running your build should succeed. We appreciate your patience while we worked through this, and we apologize for the disruption.
Read the full incident report →
- Detected by Pingoru
- Mar 03, 2026, 05:44 PM UTC
- Resolved
- Mar 03, 2026, 08:17 PM UTC
- Duration
- 2h 33m
Affected: Dashboard
Timeline · 4 updates
-
investigating Mar 03, 2026, 05:44 PM UTC
We're experiencing an issue where the Dashboard may redirect to an unexpected page on initial load, which can prevent access to certain account and billing pages.
-
monitoring Mar 03, 2026, 07:45 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 03, 2026, 08:17 PM UTC
This incident has been resolved.
-
postmortem Mar 09, 2026, 01:01 PM UTC
We identified an issue in Dashboard v32.2.0, released on March 2, 2026, where opening the dashboard via a direct link to certain pages \(such as billing or other account management pages\) could result in being unexpectedly redirected to the fleets overview. This was caused by a race condition in our access control logic that made a routing decision before all authorization data had finished loading. The issue was resolved on March 3, 2026 with a fix that ensures the dashboard waits for all access information to be available before determining whether a user can view a page. We understand this was frustrating, particularly for users trying to manage billing or account settings via bookmarked or shared links. We apologize for the disruption and are adding test coverage for direct-link navigation to prevent similar regressions in the future.
Read the full incident report →
- Detected by Pingoru
- Feb 26, 2026, 09:13 PM UTC
- Resolved
- Feb 27, 2026, 03:39 AM UTC
- Duration
- 6h 25m
Affected: SSH proxy
Timeline · 6 updates
Read the full incident report →
- Detected by Pingoru
- Feb 10, 2026, 09:33 AM UTC
- Resolved
- Feb 11, 2026, 12:02 AM UTC
- Duration
- 14h 28m
Affected: Delta Image Downloads
Timeline · 4 updates
-
investigating Feb 10, 2026, 09:33 AM UTC
Some delta generation requests are encountering errors and failing. We are currently investigating this issue.
-
monitoring Feb 10, 2026, 10:25 AM UTC
We have identified the potential cause and have rolled back the changes.
-
resolved Feb 11, 2026, 12:02 AM UTC
This incident has been resolved.
-
postmortem Feb 11, 2026, 12:10 AM UTC
v2 delta generation service experienced failures from ~21:15 UTC Feb 9 to ~10:00 UTC Feb 10, 2026, due to a missing configuration dependency during a logic change. **Impact:** * v2 delta generation requests failed to complete * No data loss or security impact **Root Cause:** Recent logic changes were deployed without the required accompanying configuration update, preventing the service from completing v2 delta requests. **Resolution:** The logic changes were rolled back, restoring the service to its previous stable state. **Follow-up Actions:** * Prepare and deploy the permanent fix We apologize for the disruption and any inconvenience this caused. We are committed to improving our processes to prevent similar issues in the future.
Read the full incident report →
Critical February 10, 2026 - Detected by Pingoru
- Feb 10, 2026, 04:05 AM UTC
- Resolved
- Feb 10, 2026, 07:56 AM UTC
- Duration
- 3h 51m
Affected: Cloudlink (VPN)
Timeline · 5 updates
-
investigating Feb 10, 2026, 04:05 AM UTC
We're experiencing an elevated level of errors in our Cloudlink infrastructure and are currently looking into the issue.
-
identified Feb 10, 2026, 06:39 AM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Feb 10, 2026, 07:17 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 10, 2026, 07:56 AM UTC
This incident has been resolved.
-
postmortem Feb 10, 2026, 11:52 AM UTC
Balena devices were unable to connect to Cloudlink on February 10, 2026, from approximately 02:26 GMT to 07:11 GMT due to an expired server certificate. Devices that were already connected to Cloudlink were unaffected unless the connection was terminated. **Root Cause:** The Cloudlink servers were using an expired certificate that was due for replacement. Consequently, incoming Cloudlink connections failed with a certificate verification error. **Resolution:** The certificate has been replaced, and Cloudlink servers were restarted to use the new certificate. Balena devices are expected to reconnect to Cloudlink within a few minutes after being disconnected due to the restart. **Follow-up Actions:** * Expand certificate expiry monitoring coverage to include all active certificates * Automate the certificate renewal process for Cloudlink We apologize for any disruption this caused and appreciate your patience as we continue improving our processes and operations.
Read the full incident report →
- Detected by Pingoru
- Jan 28, 2026, 01:40 AM UTC
- Resolved
- Jan 28, 2026, 01:00 AM UTC
- Duration
- —
Timeline · 2 updates
Read the full incident report →
- Detected by Pingoru
- Dec 23, 2025, 04:32 PM UTC
- Resolved
- Dec 24, 2025, 09:38 AM UTC
- Duration
- 17h 6m
Affected: APIApplication Builderbalenahub
Timeline · 3 updates
-
investigating Dec 23, 2025, 04:32 PM UTC
We are currently investigating an issue affecting the availability of balenaCloud services.
-
monitoring Dec 23, 2025, 07:03 PM UTC
Scaling issues during service deployment caused by unavailability of nodes from underlying scaling provider.
-
resolved Dec 24, 2025, 09:38 AM UTC
Insufficient AWS compute capacity overloaded the remaining nodes. This high load caused readiness probes to fail, triggering API restarts that created a feedback loop of increasing pressure.
Read the full incident report →
Critical December 5, 2025 - Detected by Pingoru
- Dec 05, 2025, 09:05 AM UTC
- Resolved
- Dec 05, 2025, 09:37 AM UTC
- Duration
- 31m
Affected: APIDashboardWebsite
Timeline · 3 updates
-
identified Dec 05, 2025, 09:05 AM UTC
CloudFlare, our proxy provider, is having service issues. Connectivity to balenaCloud services are currently affected.
-
monitoring Dec 05, 2025, 09:16 AM UTC
Our upstream provider has implemented some fixes. balenaCloud services are back online. We are still monitoring the situation.
-
resolved Dec 05, 2025, 09:37 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Nov 12, 2025, 09:13 AM UTC
- Resolved
- Nov 12, 2025, 07:58 PM UTC
- Duration
- 10h 45m
Affected: Application Builder
Timeline · 5 updates
Read the full incident report →
- Detected by Pingoru
- Oct 01, 2025, 02:51 PM UTC
- Resolved
- Oct 01, 2025, 01:30 PM UTC
- Duration
- —
Timeline · 1 update
-
resolved Oct 01, 2025, 02:51 PM UTC
An update to our kube-system infrastructure resulted in a disruptive pod rollout that left some devices disconnected from Cloudlink for up to 10 minutes. We apologize for the interruption and future updates to this component will be handled within planned maintenance windows.
Read the full incident report →
- Detected by Pingoru
- Sep 30, 2025, 07:47 PM UTC
- Resolved
- Sep 30, 2025, 09:56 PM UTC
- Duration
- 2h 8m
Affected: Cloudlink (VPN)
Timeline · 5 updates
Read the full incident report →
- Detected by Pingoru
- Jul 07, 2025, 10:40 AM UTC
- Resolved
- Jul 07, 2025, 10:00 AM UTC
- Duration
- —
Timeline · 1 update
-
resolved Jul 07, 2025, 10:40 AM UTC
Temporarily degraded performance of API response processing.
Read the full incident report →
- Detected by Pingoru
- Apr 17, 2025, 02:38 PM UTC
- Resolved
- Apr 18, 2025, 12:26 PM UTC
- Duration
- 21h 47m
Affected: API
Timeline · 4 updates
-
investigating Apr 17, 2025, 02:38 PM UTC
We are currently investigating this issue.
-
identified Apr 17, 2025, 02:56 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Apr 17, 2025, 03:00 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 18, 2025, 12:26 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 10, 2025, 01:00 PM UTC
- Resolved
- Apr 10, 2025, 02:40 PM UTC
- Duration
- 1h 40m
Affected: Dashboard
Timeline · 5 updates
-
investigating Apr 10, 2025, 01:00 PM UTC
We are currently investigating an issue with users unable to pin releases in the dashboard.
-
identified Apr 10, 2025, 01:00 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Apr 10, 2025, 01:59 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 10, 2025, 02:40 PM UTC
This incident has been resolved.
-
postmortem Apr 14, 2025, 12:52 PM UTC
A UI update included updates to a few nested dependencies, which changed a behavior we relied on. The result was preventing the data population of available releases in the Target Release section on the Fleet Summary page. We initially reverted the changed package, and subsequently changed our approach so that it works with the latest version of the package.
Read the full incident report →
- Detected by Pingoru
- Mar 22, 2025, 12:10 AM UTC
- Resolved
- Mar 22, 2025, 12:40 AM UTC
- Duration
- 30m
Affected: Device URLsCloudlink (VPN)
Timeline · 5 updates
-
identified Mar 22, 2025, 12:10 AM UTC
The issue has been identified and a fix is being implemented.
-
identified Mar 22, 2025, 12:10 AM UTC
We are continuing to work on a fix for this issue.
-
monitoring Mar 22, 2025, 12:37 AM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Mar 22, 2025, 12:37 AM UTC
We are continuing to monitor for any further issues.
-
resolved Mar 22, 2025, 12:40 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 05, 2025, 04:45 PM UTC
- Resolved
- Feb 06, 2025, 01:58 PM UTC
- Duration
- 21h 13m
Affected: APIDashboard
Timeline · 5 updates
-
investigating Feb 05, 2025, 05:51 PM UTC
We're experiencing an elevated level of log stream errors and are currently looking into the issue.
-
monitoring Feb 05, 2025, 05:53 PM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Feb 05, 2025, 05:53 PM UTC
We are continuing to monitor for any further issues.
-
resolved Feb 06, 2025, 01:58 PM UTC
This incident has been resolved.
-
postmortem Feb 06, 2025, 07:26 PM UTC
We deployed an upgrade to our log system that worked well in staging, but in production hit issues with log streams so we quickly rolled it back. We have identified the issue and why it only affected one environment, and are investigating an alternate solution going forward.
Read the full incident report →
Critical December 24, 2024 - Detected by Pingoru
- Dec 24, 2024, 11:06 AM UTC
- Resolved
- Dec 24, 2024, 11:40 AM UTC
- Duration
- 34m
Affected: Device URLs
Timeline · 3 updates
-
investigating Dec 24, 2024, 11:06 AM UTC
We're experiencing an elevated level of Device URL errors and are currently looking into the issue.
-
monitoring Dec 24, 2024, 11:35 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Dec 24, 2024, 11:40 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Dec 13, 2024, 02:31 AM UTC
- Resolved
- Dec 12, 2024, 09:00 PM UTC
- Duration
- —
Timeline · 1 update
-
resolved Dec 13, 2024, 02:31 AM UTC
The OS versions list under the settings menu of some devices may not be showing any versions for some devices preventing users from upgrading their HostOS. Some users may also see a notice on their device summary page saying "OS downgrades are not allowed". We have reverted the dashboard to a previous version that has a working OS version list for upgrading HostOS. We are still investigating why the OS version list is not rendered in the latest dashboard version.
Read the full incident report →
- Detected by Pingoru
- Nov 12, 2024, 09:06 AM UTC
- Resolved
- Nov 12, 2024, 09:06 AM UTC
- Duration
- —
Timeline · 1 update
-
resolved Nov 12, 2024, 09:06 AM UTC
The delta image back-end was failing to connect to our internal worker nodes because it was using an outdated authentication certificate. The back-end and the worker nodes were to be configured to use a new certificate at the same time to avoid disruptions but we had a delay in re-configuring the back-end. We are reviewing the process to avoid more disruptions like this in the future.
Read the full incident report →
- Detected by Pingoru
- Oct 28, 2024, 01:20 PM UTC
- Resolved
- Oct 27, 2024, 01:00 AM UTC
- Duration
- —
Timeline · 1 update
-
resolved Oct 28, 2024, 01:20 PM UTC
Some devices may have the status of "Reduced Functionality" on the dashboard because the API returned outdated Heartbeat status. We found that the error occurred after we switched to a new cache for the API backend. We applied a fix to the API to allow it to display the latest reported heartbeat status from the devices. We are still investigating how to prevent this error from occurring in the future.
Read the full incident report →
- Detected by Pingoru
- Sep 11, 2024, 12:00 PM UTC
- Resolved
- Sep 11, 2024, 03:00 PM UTC
- Duration
- 3h
Affected: Application Builder
Timeline · 4 updates
-
investigating Sep 11, 2024, 12:47 PM UTC
We're currently investigating the builder returning 403 errors
-
identified Sep 11, 2024, 01:11 PM UTC
The issue has been identified and we're working on a fix.
-
monitoring Sep 11, 2024, 02:17 PM UTC
A temporary fix has been deployed.
-
resolved Sep 12, 2024, 09:49 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Jul 25, 2024, 04:57 PM UTC
- Resolved
- Jul 31, 2024, 01:25 PM UTC
- Duration
- 5d 20h
Affected: Application BuilderApplication RegistryDelta Image Downloads
Timeline · 7 updates
-
investigating Jul 25, 2024, 04:57 PM UTC
We're experiencing degraded performance around creating new releases on our builders and are currently looking into the issue. Users experiencing issues creating releases using the `balena push` command, are advised to temporarily switch to local builds via `balena deploy`, if possible, until the incident is resolved.
-
investigating Jul 25, 2024, 08:31 PM UTC
We are continuing to investigate this issue.
-
investigating Jul 25, 2024, 11:59 PM UTC
We are continuing to investigate this issue.
-
investigating Jul 26, 2024, 12:13 PM UTC
We are continuing to investigate this issue.
-
monitoring Jul 26, 2024, 03:13 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Jul 31, 2024, 01:25 PM UTC
This incident has been resolved.
-
postmortem Jul 31, 2024, 01:25 PM UTC
We experienced timeouts for deltas and builds when pushing images to our registry hosted in the US East \(N. Virginia\) region. This issue impacted our cloud builders in Finland and Germany, among other regions. The root cause was identified as a public routing issue between certain regions, affecting the ability of some of our systems to access the registry efficiently. We resolved the issue by enabling proxied routing protocols for our registry endpoint. This allowed us to bypass the impacted network paths and restore normal operations. ## Impact * Cloud builders in Finland and Germany experienced delays in image pushing * Potential delays in deployment pipelines for affected regions * No data loss or security breaches occurred
Read the full incident report →
- Detected by Pingoru
- Jul 24, 2024, 08:45 AM UTC
- Resolved
- Jul 24, 2024, 09:22 AM UTC
- Duration
- 37m
Affected: Application Builder
Timeline · 3 updates
-
identified Jul 24, 2024, 09:03 AM UTC
Some builds sent to our arm64 builder infrastructure are failing. The problem is caused by a faulty garbage collection process. We expect everything to return to normal quickly.
-
monitoring Jul 24, 2024, 09:08 AM UTC
All affected arm64 builders are back online. We're monitoring the situation.
-
resolved Jul 24, 2024, 09:22 AM UTC
This incident has been resolved.
Read the full incident report →