Hosted Mender incident

Kubernetes API Server error increased

Notice Resolved View vendor source →

Hosted Mender experienced a notice incident on February 7, 2024 affecting Hosted Mender EU, lasting 14h 40m. The incident has been resolved; the full update timeline is below.

Started
Feb 07, 2024, 07:23 AM UTC
Resolved
Feb 07, 2024, 10:03 PM UTC
Duration
14h 40m
Detected by Pingoru
Feb 07, 2024, 07:23 AM UTC

Affected components

Hosted Mender EU

Update timeline

  1. investigating Feb 07, 2024, 07:23 AM UTC

    Our metrics are showing increments in the Kubernetes API Server read error. We're investigating the issue. The Hosted Mender EU service is fully operational.

  2. investigating Feb 07, 2024, 11:37 AM UTC

    The Azure support has been involved, but no clear issue is discovered. All services are running fine, including the system pods. The only issue is ~6% of error Read requests in the Kubernetes API server metrics. We're suspecting an issue with a node and we're going to create an online maintenance window for this evening to restart the nodes of the system pool. We can confirm that Hosted Mender EU is running without any issue. No new updates are expected until this evening.

  3. monitoring Feb 07, 2024, 06:18 PM UTC

    At about 14:30 UTC the metrics reported no more issues with no action on our side. We're in contact with the Azure support to check the activities on their side.

  4. resolved Feb 07, 2024, 10:03 PM UTC

    The monitoring system reports no more issues, and we close the incident.