LiveKit Outage History

LiveKit is up right now

There were 16 LiveKit outages since February 20, 2026 totaling 143h 28m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.livekit.io

Notice April 24, 2026

Investigating SIP participant timeouts and signalling connection errors

Detected by Pingoru
Apr 24, 2026, 06:12 PM UTC
Resolved
Apr 29, 2026, 02:41 PM UTC
Duration
4d 20h
Affected: India - Real Time CommunicationAustralia - Real Time CommunicationSingapore - Real Time CommunicationUS Central - Real Time CommunicationSouth Africa - Real Time CommunicationUS East - Real Time Communication
Timeline · 3 updates
  1. investigating Apr 24, 2026, 06:12 PM UTC

    We received a single report of CreateSIPParticipant timeouts in Singapore. While investigating this, we discovered an increased rate of signalling connection errors on a very small minority of requests, mainly in India. We haven’t received any other reports, but are proactively creating this incident in case users encounter increased connection latency. We are continuing to investigate and will update here once we know more.

  2. identified Apr 24, 2026, 11:20 PM UTC

    A fix is being implemented. We still have not received any further reports and believe the impact to be minor, but we're continuing to monitor for further issues.

  3. resolved Apr 29, 2026, 02:41 PM UTC

    We are closing this incident as we have not received further reports and the signalling error rate has dropped back to baseline. We made some fine-tuning adjustments to our stack to improve performance and will continue to seek out opportunities to keep latency low.

Read the full incident report →

Notice April 22, 2026

Rejected INVITEs in US East

Detected by Pingoru
Apr 22, 2026, 01:00 PM UTC
Resolved
Apr 22, 2026, 01:00 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 23, 2026, 01:38 AM UTC

    Between 13:15 and 14:25 UTC on April 22, approximately 5% of inbound SIP calls routed through our US East region were rejected by our system due to a failing internal trunk lookup. Upstream carriers surfaced these rejections to end users as 503 "Service Unavailable" responses. A recent change to our internal service responsible for SIP trunk authorization lookups caused trunk queries to return empty results under certain conditions. When our SIP service received an empty trunk lookup, it rejected the inbound INVITE. The regression was deployed to one US region as part of a staged rollout. Our routine checks during the release identified the issue. Once the offending change was rolled back, inbound call rejections returned to baseline within minutes and full service was restored. Other regions and all outbound calls were unaffected. We are introducing a dedicated monitor for this specific failure mode so that any recurrence pages our on-call engineers immediately, rather than relying on broader error-rate signals.

Read the full incident report →

Major April 15, 2026

Increased Latency in RoomService APIs, brief period of higher error rate

Detected by Pingoru
Apr 15, 2026, 11:23 PM UTC
Resolved
Apr 16, 2026, 04:09 AM UTC
Duration
4h 45m
Affected: Global Real Time Communication
Timeline · 8 updates
  1. investigating Apr 15, 2026, 11:23 PM UTC

    We are investigating reports of increased latencies in RoomService APIs in the US West region, specifically on CreateRoom, DeleteRoom, and UpdateRoomMetadata APIs.

  2. investigating Apr 16, 2026, 12:13 AM UTC

    We are continuing to investigate this issue.

  3. investigating Apr 16, 2026, 01:07 AM UTC

    We believe these elevated latencies began around 22:00 UTC. We have confirmed that only API requests in US-West should be impacted. The current list of impacted APIs appears to be CreateRoom, DeleteRoom, and UpdateRoomMetadata. We are working on mitigating the issue to return latencies back to normal.

  4. identified Apr 16, 2026, 02:10 AM UTC

    We continue to see the long Room API latencies which are now also impacting other regions. The latency increases appear to originate from a specific table in our distributed database. The issue has been escalated with the database vendor and we are working on a workaround for decreasing the API latencies. Other services are not impacted.

  5. investigating Apr 16, 2026, 03:45 AM UTC

    While applying a fix for the API latencies, we are temporarily seeing increased failure rates in RoomServices APIs, including CreateRoom, UpdateRoomMetadata, and DeleteRoom. We are actively working on mitigating this. Impact has been upgraded to major.

  6. monitoring Apr 16, 2026, 03:51 AM UTC

    Our fix is fully implemented and we are not seeing any more failures or high latencies of the RoomService APIs. We are continuing to monitor the issue. We did observe a period of 15 minutes with high API failures while mitigation steps were being applied.

  7. resolved Apr 16, 2026, 04:09 AM UTC

    This issue is now fully resolved. We will be posting a detailed RCA.

  8. postmortem Apr 17, 2026, 04:37 PM UTC

    ## Summary LiveKit's core realtime and agent services are designed to tolerate database failures. WebRTC media, SIP calls, and hosted agent sessions continue to operate even when our database backend is slow or unavailable. A subset of Room APIs, specifically `CreateRoom`, `DeleteRoom`, and `UpdateRoomMetadata`, do depend on a database for consistency and disaster recovery. That database is highly available and globally distributed, with no single point of failure. When it is under significant contention, these Room APIs can return errors or time out, while realtime traffic continues to flow normally. On 2026-04-15, database contention caused a percentage of Room API calls to fail in our US-West region. Remediation work later produced a 26-minute global outage of the Room APIs. Realtime sessions, SIP calls, and agent processes were unaffected throughout. We sincerely apologize to customers whose applications were disrupted during this incident. ## Impact The incident had two distinct phases of customer impact. ### Phase 1: Elevated Room API timeouts in US-West \(2026-04-15 22:10 UTC to 2026-04-16 03:14 UTC\) A percentage of `DeleteRoom`, `UpdateRoomMetadata`, and `ListRooms` calls timed out, primarily in our US-West region. Other regions saw limited impact during this phase. Customers with high Room API volume in US-West observed elevated error rates on their integrations; the majority of customers were not affected. ### Phase 2: Global Room API outage \(2026-04-16 03:14 to 03:40 UTC, ~26 minutes\) While we were swapping in a rebuilt `rooms` table, the table was briefly missing from the database, and the majority of Room API calls globally returned HTTP 500 with `ERROR: relation "rooms" does not exist`. WebRTC sessions, SIP calls, and agent processes continued to function, and realtime connection counts remained stable. Applications that depend on Room APIs to start or manage sessions saw visible failures during this window. ## Root Cause The sweeper is a background process that removes rows from the `rooms` table as sessions end. Earlier on 2026-04-15, its throughput dropped significantly, and over roughly 8 hours stale rows accumulated to the point where the table was many times larger than its intended steady-state size. At approximately 21:00 UTC, a routine schema migration was applied to a different, unrelated table. The migration itself did not touch `rooms`, but it raised overall database disk utilization and background load. Combined with the oversized `rooms` table, this produced enough contention to slow down reads and writes against it. The effect first appeared in US-West, where the regional mix of Room API traffic was most sensitive to the contention. Once we identified the oversized table as the underlying cause, we needed to restore it to a healthy size. Because the table was already contended, deleting rows directly would have taken additional locks and worsened the contention. We instead chose to rebuild the table: create a new table with the same schema, copy over the active rows, then atomically swap the new table into place via a pair of renames. The copy phase completed quickly. The first rename \(moving the old `rooms` table aside\) completed in about 2.5 minutes. The second rename, moving the new table into the `rooms` name, stalled on our globally distributed database for significantly longer than we anticipated. During the stall, the `rooms` table did not exist from the perspective of any region, and all Room API calls globally returned errors. After roughly 10 minutes, we aborted the stalled rename, created a fresh `rooms` table from scratch, and inserted the active rows into it. Room API traffic recovered globally shortly after. ## Corrective Actions & Prevention The following improvements have been implemented or initiated to reduce the likelihood and impact of similar incidents: * **Enhance monitoring for sweeper throughput and active room count.** We are adding and hardening alerts on sweeper throughput and active room count, so that any future divergence pages on-call well before it threatens production. * **Improve sweeper resilience and throughput.** We are investigating the cause of the sweeper's throughput drop and adding capacity headroom so a transient slowdown cannot translate into multi-hour backlog growth. * **Remove database as a dependency for Room APIs**. This incident reaffirmed our long-held design principle that realtime services should not depend on databases. We believe this is the only way to build a system that approaches 100% uptime, and we will continue the work to ensure Room APIs do not depend on a database either. The Phase 2 outage was caused by our own remediation, and we recognize how disruptive it was for applications that depend on the Room APIs. We are committed to the work above to reduce both the likelihood and the blast radius of a similar failure in the future. Thank you for your patience, and we welcome any additional feedback from customers who were affected.

Read the full incident report →

Notice April 8, 2026

LiveKit Cloud Dashboard Missing Observability Sessions

Detected by Pingoru
Apr 08, 2026, 04:18 PM UTC
Resolved
Apr 08, 2026, 04:18 PM UTC
Duration
Affected: Cloud Dashboard (cloud.livekit.io)
Timeline · 1 update
  1. resolved Apr 08, 2026, 04:18 PM UTC

    We have identified a bug where LiveKit Cloud projects created after April 1 at 20:41 UTC which enabled Agent Insights encountered a bug where their observability data (recordings, traces, and agent logs) was not saved correctly. We have resolved the issue on April 8 at 03:19 UTC and Agent Insights should be fully operational for all new and existing projects. No action is required from impacted users. We will follow up with a postmortem as soon as possible.

Read the full incident report →

Critical April 3, 2026

LiveKit Cloud Dashboard Down

Detected by Pingoru
Apr 03, 2026, 05:13 PM UTC
Resolved
Apr 03, 2026, 05:23 PM UTC
Duration
9m
Affected: Cloud Dashboard (cloud.livekit.io)
Timeline · 3 updates
  1. investigating Apr 03, 2026, 05:13 PM UTC

    We are currently investigating this issue and will update as soon as we know more.

  2. resolved Apr 03, 2026, 05:23 PM UTC

    A fix has been implemented and we are monitoring the results. All users should now be able to access the Cloud Dashboard.

  3. postmortem Apr 03, 2026, 11:00 PM UTC

    **Root Cause** A gap in our DNS update process resulted in a misconfiguration of the DNS for [livekit.io](http://livekit.io), causing several A records to be overwritten. This resulted in an incorrect IP address being returned for [livekit.io](http://livekit.io). All real-time services - calls, RTC, and hosted agents - were unaffected as they operate on separate domains. We do have DNS monitoring in place, but it was not configured to page on-call. As a result, the issue was identified by LiveKit engineering after a short delay. **Timeline** * **2026-04-03 17:46 UTC**: DNS configuration change applied; resolution begins failing for [livekit.io](http://livekit.io). * **2026-04-03 17:52 UTC**: Issue identified by LiveKit. * **2026-04-03 18:15 UTC**: Root cause identified and fix applied. * **2026-04-03 18:18 UTC**: Fix fully propagated; services restored. Total duration: ~32 minutes. **Mitigations** * Enforce tighter automated guardrails around DNS updates. * Enable paging on existing DNS health checks so any future issues are caught even sooner.

Read the full incident report →

Minor April 1, 2026

Degraded Connectivity Issues – EU (Frankfurt)

Detected by Pingoru
Apr 01, 2026, 10:37 AM UTC
Resolved
Apr 01, 2026, 01:01 PM UTC
Duration
2h 23m
Affected: Europe Central - TURNEurope Central - Real Time Communication
Timeline · 2 updates
  1. monitoring Apr 01, 2026, 10:37 AM UTC

    We identified degraded connection errors affecting a small percentage of requests to Real Time Communication and TURN services in our EU (Frankfurt) region. Automatic retries prevented applications from experiencing failures. A fix has been implemented and we are actively monitoring to confirm stability.

  2. resolved Apr 01, 2026, 01:01 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor March 30, 2026

Analytics updates on Cloud Dashboard are delayed

Detected by Pingoru
Mar 30, 2026, 11:26 AM UTC
Resolved
Mar 30, 2026, 09:01 PM UTC
Duration
9h 34m
Affected: Cloud Dashboard (cloud.livekit.io)
Timeline · 6 updates
  1. investigating Mar 30, 2026, 11:26 AM UTC

    We are currently investigating the issue. Only dashboard updates are affected (real time communication is not affected).

  2. identified Mar 30, 2026, 03:01 PM UTC

    The issue has been identified. We are working on a fix.

  3. identified Mar 30, 2026, 06:46 PM UTC

    A fix has been deployed. We expect the service to be restored shortly and will provide another update once it is back online.

  4. monitoring Mar 30, 2026, 08:12 PM UTC

    The fix has been applied and real-time processing for the Cloud Dashboard is back online globally. We are monitoring the pipeline.

  5. monitoring Mar 30, 2026, 08:55 PM UTC

    We have initiated the process of backfilling delayed analytics data.

  6. resolved Mar 30, 2026, 09:01 PM UTC

    Real-time analytics updates on Cloud Dashboard is now fully functional, and delayed data are in the process of being backfilled.

Read the full incident report →

Minor March 30, 2026

Degraded Performance LiveKit Cloud dashboard

Detected by Pingoru
Mar 30, 2026, 09:09 AM UTC
Resolved
Mar 30, 2026, 09:45 AM UTC
Duration
36m
Affected: Cloud Dashboard (cloud.livekit.io)
Timeline · 4 updates
  1. investigating Mar 30, 2026, 09:09 AM UTC

    Users may be unable to access the dashboard. Our team is actively investigating.

  2. investigating Mar 30, 2026, 09:09 AM UTC

    We are continuing to investigate this issue.

  3. monitoring Mar 30, 2026, 09:20 AM UTC

    A fix has been implemented and we are monitoring the result

  4. resolved Mar 30, 2026, 09:45 AM UTC

    This incident has been resolved.

Read the full incident report →

Notice March 23, 2026

Subset of SIP Outbound Call Failures in US East

Detected by Pingoru
Mar 23, 2026, 06:00 PM UTC
Resolved
Mar 23, 2026, 06:00 PM UTC
Duration
Timeline · 1 update
  1. resolved Mar 24, 2026, 01:17 AM UTC

    Between 15:00–17:25 UTC, a subset of SIP outbound calls in the US East region experienced failures due to an internal DNS resolution issue on a single host. Approximately 1.22% of calls within the region were affected. The affected host was taken out of rotation and workloads were redistributed. Call failure rates returned to baseline following remediation.

Read the full incident report →

Notice March 18, 2026

MCP server returning errors

Detected by Pingoru
Mar 18, 2026, 06:30 PM UTC
Resolved
Mar 18, 2026, 06:30 PM UTC
Duration
Timeline · 1 update
  1. resolved Mar 18, 2026, 09:04 PM UTC

    A dependency update in our latest docs deployment caused all requests to the MCP server at docs.livekit.io/mcp/ to return 500 errors. The issue began at 18:50 UTC and ended at 19:33 UTC. During this roughly 45 minute window, MCP clients - including IDE integrations and the LiveKit CLI - were unable to connect to the docs MCP server. The issue was caused by a package upgrade that introduced dependencies incompatible with our serverless hosting environment. We rolled back the change and confirmed the endpoint was fully operational at 19:33 UTC. No other services were affected.

Read the full incident report →

Notice March 18, 2026

Real time communication service degradation in Mumbai

Detected by Pingoru
Mar 18, 2026, 11:25 AM UTC
Resolved
Mar 18, 2026, 11:25 AM UTC
Duration
Timeline · 1 update
  1. resolved Mar 18, 2026, 11:25 AM UTC

    We experienced degraded networking affecting core real time communication services in Mumbai region. The issue began at 06:06 UTC and was resolved by 06:22 UTC on March 18, 2026. All services are now restored and we are continuing to monitor.

Read the full incident report →

Minor March 14, 2026

Increased participant subscription failures

Detected by Pingoru
Mar 14, 2026, 12:56 AM UTC
Resolved
Mar 13, 2026, 08:00 PM UTC
Duration
Timeline · 1 update
  1. resolved Mar 14, 2026, 12:56 AM UTC

    Between 21:00 and 21:15 UTC, a limited amount of participant subscriptions failed across regions, along with increased subscription latency to subscribers in the Chicago region

Read the full incident report →

Notice March 2, 2026

Inbound SIP (Twilio/UDP) Degradation | US East

Detected by Pingoru
Mar 02, 2026, 10:30 AM UTC
Resolved
Mar 02, 2026, 10:30 AM UTC
Duration
Timeline · 2 updates
  1. resolved Mar 02, 2026, 12:22 PM UTC

    SIP calling over UDP in the US East region experienced failures from ~10:30–11:35 UTC, affecting inbound calls from Twilio. TCP and TLS traffic was not impacted. Service has recovered and we are monitoring.

  2. postmortem Mar 06, 2026, 09:15 AM UTC

    **Root Cause** During a failover event on the SIP load balancer in US East, UDP packets were incorrectly treated as part of an existing "connection" by the underlying VNIC stack's connection tracking \(conntrack\) mechanism. As a result, these packets continued to be forwarded to the previous \(now non-existent\) node, causing SIP INVITEs over UDP to be dropped for the duration of the incident. **Technical Details** The SIP load balancers use Virtual Network Interface Cards \(VNICs\) to enable high availability. VNICs allow the public IP to remain unchanged while traffic is redirected to different physical instances \(e.g., during pod failures, software updates, or maintenance\). This supports seamless failover to standby instances and instance cycling. On March 2, 2026, an instance cycling process was initiated to apply security patches. Standard procedure involves adding additional IPs to DNS before rotating instances to maintain capacity. However, an operator error led to direct patching without first scaling up capacity. This operation should have been safe, as the VNIC is expected to redirect traffic to the new instance. * For TCP traffic, failover worked as intended. * For UDP, the VNIC's conntrack behavior differed: UDP flows are tracked using a four-tuple \(source IP:port - destination IP:port\). New incoming SIP INVITEs from the same client IP/port combination were treated as belonging to the same pre-failover "connection" causing them to be forwarded to the old node—even after it was terminated. * This issue specifically affected traffic from providers like Twilio, whose clients reused the same source IP/port for sequential INVITEs, resulting in dropped inbound calls. **Timeline** * 2026-03-02 10:30:00 UTC: Operator initiated instance cycling. * 2026-03-02 10:55:00 UTC: Monitoring systems detected the issue and paged the on-call engineer. * 2026-03-02 11:27:00 UTC: Traffic was redirected away from the affected US East region, mitigating the impact. **Monitoring & Detection** LiveKit employs two layers of end-to-end monitoring for the SIP infrastructure: 1. Simulated SIP pings: These send periodic OPTIONS packets to verify reachability of the SIP load balancers. They failed to detect the issue because each ping used a different source IP/port, avoiding the problematic conntrack entries. 2. End to end SIP calls: These use external providers \(e.g., Twilio\) to place real calls, verifying successful establishment and bidirectional audio flow. They run at a lower frequency. They successfully detected the drops \(as Twilio traffic matched the faulty conntrack behavior\), but detection was delayed due to the lower check interval. **Mitigations** To prevent recurrence and improve resilience, we will: * Increase redundancy at the SIP load balancer layer to tolerate up to 2 out of 3 nodes failing without service impact. * Enforce stricter, standardized operating procedures for instance cycling \(e.g., mandatory capacity addition via DNS updates before rotation; additional peer review or automation safeguards\). * Increase the frequency of end-to-end SIP call monitoring to enable faster detection of UDP-specific issues. This incident highlights subtle differences in how stateful conntrack handles UDP vs TCP during VNIC failover for protocols like SIP. These changes will help ensure more robust handling of similar maintenance operations in the future

Read the full incident report →

Minor February 27, 2026

Analytics dashboard updates delayed

Detected by Pingoru
Feb 27, 2026, 12:18 AM UTC
Resolved
Feb 27, 2026, 02:40 AM UTC
Duration
2h 22m
Affected: Cloud Dashboard (cloud.livekit.io)
Timeline · 2 updates
  1. identified Feb 27, 2026, 12:18 AM UTC

    Updates to the LiveKit Cloud analytics dashboards are currently delayed due to an issue with our processing pipeline. Ingestion of new data is not affected, and we are in the process of recovering all data. We'll share another update once the restoration is complete and dashboards are updating normally.

  2. resolved Feb 27, 2026, 02:40 AM UTC

    Analytics dashboards are updating normally again.

Read the full incident report →

Notice February 20, 2026

Reports of SIP INVITES not getting responses in Chicago region

Detected by Pingoru
Feb 20, 2026, 09:47 PM UTC
Resolved
Feb 21, 2026, 04:54 AM UTC
Duration
7h 7m
Affected: US Central - SIP
Timeline · 4 updates
  1. investigating Feb 20, 2026, 09:47 PM UTC

    We are currently investigating user reports of SIP invites not getting responses in our Chicago region. We have not yet determined if the issue is due to the users' trunking providers, but we have begun routing traffic away from Chicago to our other US clusters to ensure continued service.

  2. monitoring Feb 20, 2026, 10:41 PM UTC

    The users who made the original reports are no longer seeing issues. We will continue investigating and post the results of our investigation as soon as they are available, but there are currently no known issues in LiveKit's SIP infrastructure.

  3. resolved Feb 21, 2026, 04:54 AM UTC

    We are going to keep traffic routing to our other US clusters, but we will consider this incident closed while we continue to investigate its origin as there is currently no impact to users.

  4. postmortem Mar 06, 2026, 09:20 AM UTC

    **Root Cause** This issue had the same root cause as the [incident](https://status.livekit.io/incidents/xkcpnmycy5m3) that occurred after it in the US east region. Please view that link for a better understanding of the root cause, technical details, monitoring and mitigations. **Timeline** 2026-02-20 19:00 UTC – Isolated customer reports came in of no sip responses to invites 2026-02-20 20:34 UTC – Looking at various customer examples, we figured out that the common factor in problematic calls was the chicago based SIP loadbalancer. 2026-02-20 21:26 UTC – SIP in Chicago was drained after ensuring it there was enough capacity in other regions.

Read the full incident report →

Major February 20, 2026

8% of RTC connections hanging in US-East

Detected by Pingoru
Feb 20, 2026, 09:12 AM UTC
Resolved
Feb 14, 2026, 04:00 PM UTC
Duration
Timeline · 2 updates
  1. resolved Feb 20, 2026, 09:12 AM UTC

    We received customer reports indicating that some agent connections to LiveKit Cloud were timing out. The symptoms typically appeared as a networking error, with calls hanging indefinitely on `room.connect()` Investigation confirmed the root cause: degradation on two Network Load Balancers (NLBs) in the US-East region. This issue affected approximately 8% of inbound agent connections to US-East during the impacted period.

  2. postmortem Feb 20, 2026, 09:16 AM UTC

    ## Summary This incident was reported directly by customers and was not detected by our internal monitoring systems. Customer-reported issues of this nature are particularly concerning because they indicate gaps in our observability. Our monitoring failed to identify the failure and therefore did not trigger automated alerting or incident response. We sincerely apologize to affected customers and take this detection failure very seriously. We are committed to doing better. ## Root cause The root cause of connection hanging was due to degradation in two of our Network Load Balancers \(NLB\) in US East, resulting in a percentage of incoming HTTPS connections hanging before reaching our backend servers. Most of our client SDKs and applications \(Web, mobile, Go SDK, etc.\) have built-in timeouts and retry in order to survive failure modes like this. For these clients, a hanging initial connection would typically timeout quickly, followed by successful retries on subsequent attempts, effectively masking the underlying problem for the majority of users. The Rust SDK \(and other SDKs built on the Rust core\) was impacted much more severely. While it implements retries, it did not enforce a connection timeout on the initial attempt. This allowed connections to hang for much longer in affected cases, leading to noticeable stalls and degraded user experience for Rust-based clients. The primary reason this incident evaded detection was that our end-to-end monitoring included probes using the JavaScript and Go SDKs, both of which gracefully handled the hanging connections via timeouts and retries. This created a blind spot for the specific failure mode. ## Incident timeline \(all times in UTC, 2026-02-14\) 17:00 - Received report from customer that a percentage of connections were unsuccessful 17:10 - We started to investigate the reports 17:20 - We've confirmed that our connection tests are passing, and error/warning rate do not look elevated 17:25 - Team concludes \(prematurely\) that no widespread issue exists 20:45 - Direct testing by IP confirms that two NLBs in US East are hanging on a ~8% of requests 20:50 - On-call engineer paged to troubleshoot load balancers 22:00 - Traffic fully diverted from the degraded load balancers; service recovered 24:00 - Created new load balancers to replace the faulty ones ## Corrective Actions & Prevention The following improvements have been implemented or initiated to reduce the likelihood and impact of similar incidents: * Added proper connection timeouts to initial requests: [https://github.com/livekit/rust-sdks/pull/895](https://github.com/livekit/rust-sdks/pull/895) * Added dedicated, synthetic continuous monitoring for every individual load balancer \(health checks that are independent of SDK retry behavior\) * Opened a detailed root-cause investigation with our cloud provider regarding the NLB degradation. We are working with them to improve upstream detection, telemetry, and handling of similar failure modes. We will continue to expand failure-mode-specific monitoring \(beyond SDK-based probes\) and periodically validate that our alerting covers realistic client behaviors across all major SDKs. Thank you for your patience and understanding. We appreciate any additional feedback from customers who were affected.

Read the full incident report →

Looking to track LiveKit downtime and outages?

Pingoru polls LiveKit's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when LiveKit reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track LiveKit alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring LiveKit for free

5 free monitors · No credit card required