Update timeline
- resolved Feb 20, 2026, 09:12 AM UTC
We received customer reports indicating that some agent connections to LiveKit Cloud were timing out. The symptoms typically appeared as a networking error, with calls hanging indefinitely on `room.connect()` Investigation confirmed the root cause: degradation on two Network Load Balancers (NLBs) in the US-East region. This issue affected approximately 8% of inbound agent connections to US-East during the impacted period.
- postmortem Feb 20, 2026, 09:16 AM UTC
## Summary This incident was reported directly by customers and was not detected by our internal monitoring systems. Customer-reported issues of this nature are particularly concerning because they indicate gaps in our observability. Our monitoring failed to identify the failure and therefore did not trigger automated alerting or incident response. We sincerely apologize to affected customers and take this detection failure very seriously. We are committed to doing better. ## Root cause The root cause of connection hanging was due to degradation in two of our Network Load Balancers \(NLB\) in US East, resulting in a percentage of incoming HTTPS connections hanging before reaching our backend servers. Most of our client SDKs and applications \(Web, mobile, Go SDK, etc.\) have built-in timeouts and retry in order to survive failure modes like this. For these clients, a hanging initial connection would typically timeout quickly, followed by successful retries on subsequent attempts, effectively masking the underlying problem for the majority of users. The Rust SDK \(and other SDKs built on the Rust core\) was impacted much more severely. While it implements retries, it did not enforce a connection timeout on the initial attempt. This allowed connections to hang for much longer in affected cases, leading to noticeable stalls and degraded user experience for Rust-based clients. The primary reason this incident evaded detection was that our end-to-end monitoring included probes using the JavaScript and Go SDKs, both of which gracefully handled the hanging connections via timeouts and retries. This created a blind spot for the specific failure mode. ## Incident timeline \(all times in UTC, 2026-02-14\) 17:00 - Received report from customer that a percentage of connections were unsuccessful 17:10 - We started to investigate the reports 17:20 - We've confirmed that our connection tests are passing, and error/warning rate do not look elevated 17:25 - Team concludes \(prematurely\) that no widespread issue exists 20:45 - Direct testing by IP confirms that two NLBs in US East are hanging on a ~8% of requests 20:50 - On-call engineer paged to troubleshoot load balancers 22:00 - Traffic fully diverted from the degraded load balancers; service recovered 24:00 - Created new load balancers to replace the faulty ones ## Corrective Actions & Prevention The following improvements have been implemented or initiated to reduce the likelihood and impact of similar incidents: * Added proper connection timeouts to initial requests: [https://github.com/livekit/rust-sdks/pull/895](https://github.com/livekit/rust-sdks/pull/895) * Added dedicated, synthetic continuous monitoring for every individual load balancer \(health checks that are independent of SDK retry behavior\) * Opened a detailed root-cause investigation with our cloud provider regarding the NLB degradation. We are working with them to improve upstream detection, telemetry, and handling of similar failure modes. We will continue to expand failure-mode-specific monitoring \(beyond SDK-based probes\) and periodically validate that our alerting covers realistic client behaviors across all major SDKs. Thank you for your patience and understanding. We appreciate any additional feedback from customers who were affected.
Looking to track LiveKit downtime and outages?
Pingoru polls LiveKit's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.
- Real-time alerts when LiveKit reports an incident
- Email, Slack, Discord, Microsoft Teams, and webhook notifications
- Track LiveKit alongside 5,000+ providers in one dashboard
- Component-level filtering
- Notification groups + maintenance calendar
5 free monitors · No credit card required