MageMojo incident

Node under emergency maintenance in US East area zone

Major Resolved View vendor source →

MageMojo experienced a major incident on March 19, 2021 affecting Webscale STRATUS - Northern Virginia, lasting 1h 34m. The incident has been resolved; the full update timeline is below.

Started
Mar 19, 2021, 02:39 PM UTC
Resolved
Mar 19, 2021, 04:14 PM UTC
Duration
1h 34m
Detected by Pingoru
Mar 19, 2021, 02:39 PM UTC

Affected components

Webscale STRATUS - Northern Virginia

Update timeline

  1. investigating Mar 19, 2021, 02:39 PM UTC

    We are currently investigating this issue.

  2. identified Mar 19, 2021, 03:22 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 19, 2021, 03:45 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Mar 19, 2021, 04:14 PM UTC

    This incident has been resolved.

  5. postmortem Mar 24, 2021, 05:03 PM UTC

    An investigation concluded that a comprehensive kernel bug hit the ZFS filesystem and caused the issue with one of the nodes in our fleet. The problem is identified as similar to the [https://github.com/openzfs/zfs/issues/10642](https://github.com/openzfs/zfs/issues/10642) bug already reported. We have captured kernel stack traces during this event, and a solution for prevention is under investigation.