Wasabi experienced a notice incident on November 30, 2024 affecting US-Central-1 (Texas) and US-East-1 (N. Virginia) and 1 more component, lasting 4h 26m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 30, 2024, 01:23 AM UTC
We are currently investigating reported network errors across all regions. Access to both Console and S3 services may return errors. We will update this page as we have more information.
- investigating Nov 30, 2024, 03:08 AM UTC
We continue to investigate the issue. Some traffic continues to receive errors.
- monitoring Nov 30, 2024, 03:43 AM UTC
We have isolated the issue and resolution is underway. We expect all regions except us-east-1 to be successful. There is still some level of error responses being seen in Ashburn and we continue to work on that. We will continue to monitor all regions and update their status as well as that of us-east-1 as we continue to make progress.
- monitoring Nov 30, 2024, 04:45 AM UTC
The system has been restored to fully operational in all regions. We will populate the Postmortem section of this incident with more complete details as soon as possible.
- resolved Nov 30, 2024, 04:46 AM UTC
This incident has been resolved.
- postmortem Dec 04, 2024, 02:08 PM UTC
On 30 November 2024 from 2024-11-30 00:17 UTC to approximately 2024-11-30 03:00 UTC, Wasabi experienced an issue where client connection attempts to Wasabi Cloud Storage and the Web Console were impacted across all storage regions, resulting in all API calls to be returned as HTTP 5XX errors to clients. The cause behind this service degradation was due to an error within the internal messaging queue service responsible for taking client requests and routing them to our global database cluster. The internal messaging queue service failed to appropriately route these client requests across all nodes. Wasabi’s Engineering and Operations teams was able to mitigate this issue by manually configuring internal servers to route requests across multiple database instances, allowing the system to recover and respond to requests appropriately. Once this action was taken, which restored service, the teams then worked to correct the root cause by working to recover the internal messaging queue and resume the automated task of proper client request handling.