Wasabi incident

System Errors in US-EAST-1 Region

Minor Resolved View vendor source →

Wasabi experienced a minor incident on September 16, 2024 affecting US-East-1 (N. Virginia), lasting 3h 11m. The incident has been resolved; the full update timeline is below.

Started
Sep 16, 2024, 11:53 AM UTC
Resolved
Sep 16, 2024, 03:05 PM UTC
Duration
3h 11m
Detected by Pingoru
Sep 16, 2024, 11:53 AM UTC

Affected components

US-East-1 (N. Virginia)

Update timeline

  1. investigating Sep 16, 2024, 11:53 AM UTC

    We are currently investigating an increase in 500 level HTTP responses on customer traffic to the us-east-1 region

  2. monitoring Sep 16, 2024, 12:48 PM UTC

    We have identified and resolved the issue. We are continuing to monitor services.

  3. resolved Sep 16, 2024, 03:05 PM UTC

    This issue is resolved, customers may have seen some elevated level of 500 HTTP responses between 04:25 and 12:40 UTC on 16 September 2024

  4. postmortem Sep 20, 2024, 02:28 PM UTC

    From 2024-09-16 04:30 UTC to 2024-09-16 12:30 UTC, we experienced an issue within our US-EAST-1 region causing customers to receive 5XX errors and a reduced ability to ingest data to customer buckets within the region. The system user-servers reached capacity with logs resulting in a failure of our streaming service. Because the user-servers were busy writing logs, they had reduced capability to handle requests. Additionally, messages that were unable to be published were written to disk, further increasing I/O operations on the system. By 12:30 UTC, our Operations team had taken corrective action by emptying the streaming queue, and restarted the user-server services. After these actions were performed, ingest to our US-EAST-1 region had been fully restored.