Livepeer Studio incident

Outage - Livestreams are down in some regions

Major Resolved View vendor source →

Livepeer Studio experienced a major incident on May 20, 2024 affecting Livepeer Streaming API, lasting 50m. The incident has been resolved; the full update timeline is below.

Started
May 20, 2024, 03:28 PM UTC
Resolved
May 20, 2024, 04:18 PM UTC
Duration
50m
Detected by Pingoru
May 20, 2024, 03:28 PM UTC

Affected components

Livepeer Streaming API

Update timeline

  1. investigating May 20, 2024, 03:28 PM UTC

    We are currently investigating this issue.

  2. resolved May 20, 2024, 04:18 PM UTC

    This incident has been resolved.

  3. postmortem May 24, 2024, 05:43 PM UTC

    # **Summary** This is a post-mortem describing the incident being investigated on 05/20/24 [https://status.livepeer.studio/incidents/tdrw49vj8y87](https://status.livepeer.studio/incidents/tdrw49vj8y87) # Incident ## Description Users reported frequent rebuffering and the inability to start or view streams. Upon investigation, the Livepeer Studio team discovered that this issue affected all regions. Our primary cloud storage provider reported an outage on May 20 from 15:11 UTC to 15:59 UTC. During this outage, a bug in the retry mechanism for uploading recordings caused several servers to lock up and become unresponsive. ## Impact * Livestreams: * Current livestreams in some regions \(London, Frankfurt, Stockholm\) would experience rebuffering * New livestreams in some regions may not have been able to stream * Viewers: * Current and new viewers in some regions experienced a high rebuffer rate * VOD: * Uploading assets and live recording will take a long time to process ## Current status The service has been fully restored [https://status.livepeer.studio/](https://status.livepeer.studio/) ## Timeline * 11:00 AM EST - An internal alert triggered an investigation by the Livepeer Studio team to identify and find the cause of this alert * 11:11 AM EST - A status alert from our storage provider notified us of an outage in one of the US regions * 12:05 PM EST - An investigation led to an outage by our storage provider at 11:11 AM EST indicated as one of the reasons for this incident * 12:18 PM EST - After monitoring the fix for the incident, the Livepeer Studio team concluded that the issue was resolved # Prevention * We enhanced our retry mechanism and implemented additional failover solutions. These solutions correctly switch to our secondary backup storage provider if the primary storage provider experiences any outages.