Hosted Mender incident
Hosted Mender EU - Connectivity issue 2026-06-18
Hosted Mender experienced a critical incident on June 19, 2026, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Jun 19, 2026, 11:56 AM UTC
Incident Summary - 2026-06-18 What happened On June 18, between 18:46 UTC and approximately 19:00 UTC, some users may have experienced intermittent connectivity issues or elevated error rates when accessing Mender services from the EU cluster. The incident was triggered by a scheduled maintenance job running on the infrastructure hosting our services. This job - responsible for cleaning up unused container images across cluster nodes - ran simultaneously on all nodes and consumed a significant amount of CPU resources. This created resource contention with the ingress controller pods responsible for routing incoming traffic, causing them to become temporarily unresponsive to health checks and restart repeatedly. The repeated restarts reduced the number of healthy ingress endpoints below the threshold required to serve traffic across all availability zones, leading to degraded routing and elevated error rates for most of the requests. Resolution The ingress controller stabilized once the image cleanup job completed. This job is a new managed feature coming from the recent Kubernetes upgrades, and since it’s not a critical feature, we suspended it. Additional safeguards have been put in place to prevent ingress controller pods from being affected by resource pressure from unrelated workloads. What we are doing The image cleanup job has been permanently suspended. CPU resource limits have been tightened on the ingress layer to isolate it from competing workloads. Node pool capacity is being expanded with larger instances to provide additional headroom. We apologize for the disruption.