Stream incident

Feed Realtime - SQS high error rate

Critical Resolved View vendor source →

Stream experienced a critical incident on January 4, 2021 affecting Personalization, lasting 6h 27m. The incident has been resolved; the full update timeline is below.

Started
Jan 04, 2021, 04:57 PM UTC
Resolved
Jan 04, 2021, 11:25 PM UTC
Duration
6h 27m
Detected by Pingoru
Jan 04, 2021, 04:57 PM UTC

Affected components

Personalization

Update timeline

  1. investigating Jan 04, 2021, 04:57 PM UTC

    We are currently investigating an issue with AWS SQS, we are receiving 100% error rate from SQS APIs. Our feeds realtime endpoint is currently unable to push notifications to SQS.

  2. monitoring Jan 04, 2021, 05:38 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Jan 04, 2021, 07:45 PM UTC

    We are continuing to monitor for any further issues.

  4. resolved Jan 04, 2021, 11:25 PM UTC

    Millions of requests to the handshake endpoint of our feed realtime system broke the API. This issue has been resolved and a full post mortem will follow.