MageMojo incident
Node under emergency maintenance in US East area zone
MageMojo experienced a major incident on March 19, 2021 affecting Webscale STRATUS - Northern Virginia, lasting 1h 34m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 19, 2021, 02:39 PM UTC
We are currently investigating this issue.
- identified Mar 19, 2021, 03:22 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Mar 19, 2021, 03:45 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 19, 2021, 04:14 PM UTC
This incident has been resolved.
- postmortem Mar 24, 2021, 05:03 PM UTC
An investigation concluded that a comprehensive kernel bug hit the ZFS filesystem and caused the issue with one of the nodes in our fleet. The problem is identified as similar to the [https://github.com/openzfs/zfs/issues/10642](https://github.com/openzfs/zfs/issues/10642) bug already reported. We have captured kernel stack traces during this event, and a solution for prevention is under investigation.