Livepeer Studio experienced a major incident on April 8, 2024 affecting Livepeer Streaming API, lasting 13m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Apr 08, 2024, 02:26 PM UTC
We have identified the cause of the livestreaming issue and implementing a fix.
- resolved Apr 08, 2024, 02:40 PM UTC
This incident has been resolved.
- postmortem Apr 12, 2024, 04:12 PM UTC
# **Summary** This is a post-mortem describing the incident being investigated on 04/08/24 [https://status.livepeer.studio/incidents/ss8n6px77ny5](https://status.livepeer.studio/incidents/ss8n6px77ny5) # Incident ## Description After deploying a fix into production, the Livepeer team received an internal alert of spikes for 500 errors. Shortly after, a user reported that their livestream playback wasn't functioning, and when they attempted to restart the stream, they couldn't ingest it. The Livepeer Studio team verified the problem and initiated an investigation into the issue. ## Impact * Livestreams: * New livestreams were not able to be ingested for all regions * Viewers: * Playback for streams in all regions were not able to view Regions: * All Regions ## Current status The service has been fully restored [https://status.livepeer.studio/](https://status.livepeer.studio/) ## Timeline * 10:12 AM EST - The Livepeer Studio team was alerted of an incident with increased amounts of 500 errors * 10:13 AM EST - Reports from a user indicating livestreams were having issues with existing broadcasts not working and playback stopping playing * 10:14 AM EST - The team from Livepeer Studio acknowledged this incident and started an investigation * 10:29 AM EST - This investigation from the Livepeer Studio team led to a recent deployment at 9:37 AM EST, once the changes were in production, an alert went off and it was quickly reverted, which resolved the issue * 10:38 AM EST - After monitoring the fix for the incident, the Livepeer Studio team concluded that the issue was resolved # Prevention * Although the fix being deployed had already been tested on our Staging environment, the rollout of it to Production resulted in a non-graceful restart of our media server, which resulted in temporary disruption to ongoing streams and an inability to create new streams. * We are putting a fix in place to ensure this doesn't affect future deployments and are reviewing our deployment procedures to try to catch these kinds of issues before they reach Production.