Livepeer Studio incident

Live streaming Degradation

Major Resolved View vendor source →

Livepeer Studio experienced a major incident on December 19, 2024 affecting Chicago (MDW) ingest and playback and New York (NYC) ingest and playback and 1 more component, lasting 1h 27m. The incident has been resolved; the full update timeline is below.

Started
Dec 19, 2024, 03:51 PM UTC
Resolved
Dec 19, 2024, 05:18 PM UTC
Duration
1h 27m
Detected by Pingoru
Dec 19, 2024, 03:51 PM UTC

Affected components

Chicago (MDW) ingest and playbackNew York (NYC) ingest and playbackMiami(MIA) ingest and playback

Update timeline

  1. investigating Dec 19, 2024, 03:51 PM UTC

    There are some servers not responding, we are currently investigating the root cause.

  2. resolved Dec 19, 2024, 05:18 PM UTC

    We have isolated the issue and implemented a fix.

  3. postmortem Dec 19, 2024, 06:30 PM UTC

    # Incident on 12/19/2024 ‌ **Overview** On Thursday, December 19th, 2024, an issue arose with live streams and video-on-demand \(VOD\) while operating on our infrastructure, resulting in disruptions across the New York, Chicago, Miami, and Madrid regions. The problem impacted streaming services and user experience within these regions. **Incident Details** An issue within the Livepeer infrastructure triggered the disruption during ongoing maintenance and configuration updates. This caused interruptions in services, including: * **Streaming Availability**: Users in the New York, Chicago, and Miami regions experienced interruptions or inability to access streams. * **VOD Availability**: Users in the Madrid region experienced interruptions or inability to upload or access videos. * **Playback Availability**: * Some livestream playback sessions failed to start, disconnected unexpectedly, or experienced buffering. * Some assets were not able to be playback. ### **Resolution** After identifying the root cause our team implemented a fix. The solution addressed the service disruptions and restored normal operations across all affected regions. ### **Mitigation Steps** To mitigate the impact and ensure a smooth recovery, we: * **Isolated Affected Services**: Redirected workloads to unaffected regions to minimize user impact. * **Applied Fixes**: Implemented configuration updates and restarted the affected service in the affected regions. * **Monitored Service Restoration**: Closely monitored infrastructure recovery to ensure stability. **Root Cause** * **Primary Cause**: Configuration changes in Livepeer infrastructure triggered service disruptions. ### **Impact Assessment** * **Users Affected**: Users in New York, Chicago regions experienced streaming interruptions. * **Service Downtime**: Approximately 3 hours before all services were fully restored. * **Impact Scope**: Regional degradation of streaming services with no data loss reported. **Next Steps** * To prevent configuration changes from impacting Livepeer’s service, we will prioritize implementing processes and tools to monitor, validate, and maintain stability both during and after these changes.