Hosted Mender incident

Temporary service disruption following a MongoDB primary node failure

Critical Resolved View vendor source →

Hosted Mender experienced a critical incident on January 8, 2025 affecting Hosted Mender US, lasting 4h. The incident has been resolved; the full update timeline is below.

Started
Jan 08, 2025, 10:06 AM UTC
Resolved
Jan 08, 2025, 02:07 PM UTC
Duration
4h
Detected by Pingoru
Jan 08, 2025, 10:06 AM UTC

Affected components

Hosted Mender US

Update timeline

  1. investigating Jan 08, 2025, 10:06 AM UTC

    Today between 09:26 UTC and 09:28 UTC we got notifications about a MongoDB node failure on the primary. We are investigating the issue, that seems to be already solved.

  2. identified Jan 08, 2025, 10:07 AM UTC

    We observed in the provider's log that it tried twice to roll it back, then the cluster gave up and elected a new primary. The cluster is self-healing and hosted Mender is operational again.

  3. monitoring Jan 08, 2025, 10:07 AM UTC

    We're monitoring the incident and the metrics to check for possible issues.

  4. resolved Jan 08, 2025, 02:07 PM UTC

    This incident has been resolved: the MongoDB cluster seems stable and no other issue has been reported.