Hosted Mender incident

Azure Kubernetes Service API Server error

Notice Resolved View vendor source →

Hosted Mender experienced a notice incident on January 16, 2024 affecting Hosted Mender EU, lasting 5h 13m. The incident has been resolved; the full update timeline is below.

Started
Jan 16, 2024, 06:24 AM UTC
Resolved
Jan 16, 2024, 11:38 AM UTC
Duration
5h 13m
Detected by Pingoru
Jan 16, 2024, 06:24 AM UTC

Affected components

Hosted Mender EU

Update timeline

  1. investigating Jan 16, 2024, 06:24 AM UTC

    Our monitoring system is reporting incremented 5xx errors on the Kubernetes API Server requests. The Kubernetes Control Plan is managed by Azure, so we opened a support ticket, and we're troubleshooting the instance in the meantime. Overall, the Hosted Mender cluster is fully operational.

  2. monitoring Jan 16, 2024, 09:29 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. identified Jan 16, 2024, 09:41 AM UTC

    The issue has been identified: one of the worker node was in a degraded state, so we started a full node pool refresh.

  4. monitoring Jan 16, 2024, 10:19 AM UTC

    All the worker nodes have been refreshed online, and the monitoring system is reporting the 5xx issue no more. We are monitoring the results.

  5. resolved Jan 16, 2024, 11:38 AM UTC

    This incident has been resolved.