Hosted Mender Outage History

Hosted Mender is up right now

There were 2 Hosted Mender outages since February 20, 2026 totaling 4h 18m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://mender.statuspage.io

Critical April 8, 2026

Issues with the Mender Server UI

Detected by Pingoru
Apr 08, 2026, 01:16 PM UTC
Resolved
Apr 08, 2026, 02:13 PM UTC
Duration
57m
Affected: Hosted Mender USHosted Mender EU
Timeline · 4 updates
  1. investigating Apr 08, 2026, 01:40 PM UTC

    We are currently investigating the issue

  2. identified Apr 08, 2026, 01:44 PM UTC

    The issue has been identified and a fix is being implemented

  3. monitoring Apr 08, 2026, 01:58 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Apr 08, 2026, 02:13 PM UTC

    Mender UI was unavailable between 2026-04-08T13:16:15Z and 2026-04-08T13:42:09Z (26m) due to a breaking change to Google Analytics dependency leading to misconfiguration. The issue has been mitigated by temporarily disabling Google Analytics until a fix is deployed.

Read the full incident report →

Major February 20, 2026

Degraded performance on hosted Mender

Detected by Pingoru
Feb 20, 2026, 09:28 AM UTC
Resolved
Feb 20, 2026, 12:50 PM UTC
Duration
3h 21m
Affected: Hosted Mender USHosted Mender EU
Timeline · 5 updates
  1. investigating Feb 20, 2026, 09:28 AM UTC

    We are currently investigating a performance issue reported by our monitoring system

  2. investigating Feb 20, 2026, 09:51 AM UTC

    It's not currently possible to accept new devices. We're continuing to investigate the issue.

  3. monitoring Feb 20, 2026, 11:33 AM UTC

    We identified an issue and applied a fix. We're monitoring the results. Now it's possible to accept new devices and the performance issue should be gone.

  4. resolved Feb 20, 2026, 12:50 PM UTC

    This incident has been resolved.

  5. postmortem Feb 26, 2026, 04:10 PM UTC

    # **Database Overload from Device Limit Migration Bug** **Date:** 2026-02-20 **Duration:** ~3 hours 25 minutes \(08:15 - 11:40 UTC\) **Severity:** High ## **Executive Summary** On February 20, 2026, the Hosted Mender platform experienced a critical service outage affecting device authentication and inventory operations. A change deployed as part of Mender v4.2.0-saas.2 failed to uniformly handle data inconsistencies in older tenant configurations. A specific call order of two independent backend endpoints in combination with scheduled cache invalidation uncovered a bug which caused a heavy increase in load on the database. The unexpected increase in load was beyond what the system is designed to handle which resulted in cascading errors and platform-wide degradation. ## **Impact** * **Duration:** Approximately 3 hours 25 minutes * **Scope:** Multi-tenant platform-wide degradation * **Affected Services:** Device authentication, device inventory * **User Experience:** Unable to accept new devices * **Business Impact:** Complete halt to device provisioning across the platform \(hosted Mender US only\) during incident window ## **Root Cause** In Mender v4.2.0-saas.2 we changed the definition of an “unlimited” device limit from 0 to -1 so the system would be able to represent limits that allow zero devices. This was done by introducing a database migration that migrated existing limits with the value 0 to have the value -1 instead and cleared the limits cache to ensure data would be collected fresh from the database post migration. Lastly, we were aware of a known edge case where certain tenants would not have a limit defined in the database and took steps to ensure consistent handling of this scenario post migration. This new version of Mender also included an internal endpoint that incorrectly set the cached device limit of a tenant to 0 in the case where a\) there was no limit in the cache from before and b\) there also was no limit in the database. This endpoint was overlooked in the steps mentioned above. When the internal endpoint was called after the cache was invalidated, but before any external endpoints that used device limits, the limit 0 was incorrectly cached for some tenants with a large number of devices and no device limit in the database. When the device authorization reprocessing logic was executed for these devices, the incorrectly cached limit caused a large amount of database queries to be executed in order to check if the limit had been exceeded \(something which is not necessary to check if the device limit is “unlimited”\). No matter the result of the check, a limit of 0 will always result in the device not being allowed to authorize with the system and devices will continuously retry in such a case, amplifying the issue manyfold until eventual and complete MongoDB resources exhaustion. ## **Timeline \(All times UTC\)** **2026-02-19** * **13:16** - Deployed Mender v4.2.0-saas.2 - _Root cause introduced_ **2026-02-20** * **~08:00** - Devices of the affected tenants started the authorization reprocessing process * **08:15** - A synthetic test failure alerted the On-call team * **08:20** - On-call investigated tenant configuration; Admin Panel queries failing with 499/504 due to DB exhaustion * **08:20** - Identified ongoing device authorization reprocessing consuming all database resources * **09:25** - Attempted to stop problematic queries * **10:30** - Discovered blocked queries still holding locks; initiated emergency database scaling * **10:40** - Database scaled; locks cleared; device acceptance partially restored * **11:55** - Cache for device-auth disabled * **11:00** - Added missing limits with value -1 \(unlimited\) in the database affected tenants * **11:40** - Service fully restored ## **What went wrong** 1. **Inadequate test coverage**The test coverage of the internal endpoint was inadequate as it didn’t verify that the correct value was used and cached in this scenario. 2. **Inadequate manual testing**Manual testing was performed, but not with a cache that was explicitly invalidated for this purpose. 3. **Uncontrolled Cascade**The device authorization reprocessing logic had a snowball effect on the platform. ## **Action Items** * Resolve the issue where limits who are intended to be “unlimited” can be incorrectly cached as 0 by this internal endpoint. * Update the device authorization reprocessing logic to not execute unnecessary database queries if the limit is 0. * Review and improve test coverage of the affected endpoints. ## **Conclusions** We want to sincerely apologize for the service disruption you experienced on February 20, 2026. For over three hours, our platform was unable to process device authentication and inventory operations, preventing you from onboarding new devices and managing your fleet. We are committed to prevent this kind of disruption in the future.

Read the full incident report →

Looking to track Hosted Mender downtime and outages?

Pingoru polls Hosted Mender's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Hosted Mender reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Hosted Mender alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Hosted Mender for free

5 free monitors · No credit card required