Hosted Mender experienced a minor incident on January 13, 2025 affecting Hosted Mender EU, lasting 1h 35m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 13, 2025, 03:01 PM UTC
We are experiencing scalability issue: new Kubernetes worker nodes are rolled out very slow. We're checking with the cloud provider.
- monitoring Jan 13, 2025, 03:13 PM UTC
Now the required load is matching the required number of Kubernetes worker nodes. We're still in contact with the cloud provider support to check the root cause. The incident is still open.
- resolved Jan 13, 2025, 04:37 PM UTC
The cloud provider support is still checking the issue. In the meantime we managed to increase the minimum number of Kubernetes worker node to prevent further autoscaling issue.
- postmortem Jan 29, 2025, 09:15 AM UTC
We discussed the incident with Azure support and decided to replace a problematic component \(an AKS Nodepool\). The new component is working fine and has no scalability issues, so we promoted it to production. No further actions are needed