LiveKit experienced a minor incident on August 15, 2025 affecting US East - Real Time Communication, lasting 5h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 15, 2025, 12:45 PM UTC
We detected an issue that impacted Core RTC in US East. Automatic mitigation pushed some traffic to US Central, but not everything could be mitigated. It lasted from 2025-08-15 11:57 - 12:14 UTC.
- investigating Aug 15, 2025, 06:40 PM UTC
The issue has been resolved at 12:14 UTC. We will follow up with a post-mortem.
- resolved Aug 15, 2025, 06:41 PM UTC
This incident has been resolved.
- postmortem Aug 21, 2025, 09:41 AM UTC
Down time in our US East was caused by the inability to update the Backend Sets on our cloud provider's Network Load Balancers \(NLBs\) after a configuration update of our Kubernetes Ingress Controllers. In attempting to increase resiliency and spread of our Ingress Controller Pods on to multiple Kubernetes Nodes, we prematurely hit a limit on maximum number of "Backends" that could be added to an NLB. All new Ingress Controller Pods were scheduled on new Kubernetes Nodes, and they were not able to be added as NLB Backends due to prematurely hitting this limit. We have identified the issue as a bug in the integration between our cloud provider's managed Kubernetes and their NLBs. The bug prevents new Nodes to be added as Backends due to the Kubernetes Service adding all Nodes, instead of only adding new Nodes, as Backends on the NLB. We are currently working with our cloud provider to resolve the bug. We have mitigated the issue by significantly reducing the number of Nodes added as NLB backends. We have also added alerts to detect the issue in the future.