Livepeer Studio incident

API not responding

Critical Resolved View vendor source →

Livepeer Studio experienced a critical incident on April 29, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Apr 29, 2024, 11:30 AM UTC
Resolved
Apr 29, 2024, 11:30 AM UTC
Duration
Detected by Pingoru
Apr 29, 2024, 11:30 AM UTC

Update timeline

  1. resolved Apr 29, 2024, 01:55 PM UTC

    This incident has been resolved.

  2. postmortem May 01, 2024, 04:25 PM UTC

    # API Outage April 29, 2024 ## **Summary** This is a post-mortem describing the incident being investigated on 04/29/24 [https://status.livepeer.studio/incidents/2mklfnf2hqbf](https://status.livepeer.studio/incidents/2mklfnf2hqbf) # Incident ## Description Internal alerts notified the Livepeer Studio team of the high utilization of memory and CPU resources within the queuing system. A required update to the queuing system, previously tested successfully in the staging environment, was necessary. However, upon deployment into production, it became apparent that the upgrade had become stuck, leading to the issue. ## Impact * Livestreams: * New streams could not stream * Viewers: * Only existing streams can be viewed Regions: * Europe \(Sweden/Russia\), North America \(Los Angeles/New York\), South America \(Brazil\) ## Current status The service has been fully restored [https://status.livepeer.studio/](https://status.livepeer.studio/) ## Timeline * 7:52 AM EST - The Livepeer Studio team was alerted of an incident related to API’s not responding * 7:57 AM EST - The investigation from the Livepeer Studio team led to tasks in the AMPQ being disconnected and backed up. This caused high consumption of CPU and memory which led to tasks being timed out * 9:10 AM EST - The Livepeer Studio team automatically upgraded the queuing system, which became stuck during the upgrading and caused this issue * 8:23 AM EST - The Livepeer Studio team has a fix in place and monitored the systems * 9:55 AM EST - After monitoring the fix for the incident, the Livepeer Studio team concluded that the issue was resolved ### Prevention We are conducting broader audits and revamping our queue utilization practices.