LiveKit Outage History

LiveKit had 66 outages in the last 2 years totaling 31h 54m of downtime — averaging 2.7 incidents per month.

There were 66 LiveKit outages since July 30, 2025 totaling 31h 54m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.livekit.io

Minor October 6, 2025

issues with building new releases on cloud agents

Detected by Pingoru: Oct 06, 2025, 10:33 PM UTC
Resolved: Oct 06, 2025, 11:20 PM UTC
Duration: 46m

Affected: Global Cloud Agents

Timeline · 4 updates

investigating Oct 06, 2025, 10:33 PM UTC

We are currently investigating the issue. Running cloud agents are not affected, only new builds.
monitoring Oct 06, 2025, 10:46 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Oct 06, 2025, 11:20 PM UTC

This incident has been resolved.
postmortem Oct 06, 2025, 11:20 PM UTC

During a routine cloud-agents control plane release, a configuration error caused the service to crash. We quickly identified and corrected the problematic config value, restoring normal operation. We are investigating ways to make our configs more robust and will look to add additional testing around this process to ensure the issue doesn’t happen again.

Read the full incident report →

Major October 2, 2025

Networking issues in Tokyo region

Detected by Pingoru: Oct 02, 2025, 10:41 AM UTC
Resolved: Oct 02, 2025, 11:05 AM UTC
Duration: 24m

Affected: Global Real Time Communication

Timeline · 4 updates

investigating Oct 02, 2025, 10:41 AM UTC

We've discovered networking issues in cloud provider in Tokyo. We're routing around it.
investigating Oct 02, 2025, 10:44 AM UTC

We are continuing to investigate this issue.
monitoring Oct 02, 2025, 10:57 AM UTC

We have mitigated by routing traffic around the affected region and continue to monitor.
resolved Oct 02, 2025, 11:05 AM UTC

This incident has been resolved.

Read the full incident report →

Minor September 25, 2025

Analytics data processing pipeline is currently offline

Detected by Pingoru: Sep 25, 2025, 11:12 AM UTC
Resolved: Sep 25, 2025, 07:40 PM UTC
Duration: 8h 27m

Affected: Cloud Dashboard (cloud.livekit.io)

Timeline · 3 updates

identified Sep 25, 2025, 11:12 AM UTC

Analytics data processing pipeline is currently down. We are working on recovering the processing pipeline. Cloud dashboard and analytics api will be missing data during this time
monitoring Sep 25, 2025, 03:00 PM UTC

A fix has been deployed and we are monitoring.
resolved Sep 25, 2025, 07:40 PM UTC

This incident has been resolved. We will work on the recovering the data for which duration the processing was down

Read the full incident report →

Minor September 25, 2025

dashboard data is delayed

Detected by Pingoru: Sep 25, 2025, 07:00 AM UTC
Resolved: Sep 24, 2025, 07:00 AM UTC
Duration: —

Timeline · 1 update

resolved Sep 25, 2025, 04:40 AM UTC

We've discovered a data pipeline issue causing Cloud dashboard data to be delayed. we are investigating the issue and will be backfilling the data.

Read the full incident report →

Minor September 25, 2025

Limited availability of egress service due to docker hub outage

Detected by Pingoru: Sep 25, 2025, 12:41 AM UTC
Resolved: Sep 25, 2025, 01:02 AM UTC
Duration: 20m

Affected: Global Egress

Timeline · 3 updates

investigating Sep 25, 2025, 12:41 AM UTC

An outage in the our docker image registry provider (docker hub) is preventing us to scale up the egress service, causing limited availability. We are working on potential mitigations and monitoring the status of docker hub
identified Sep 25, 2025, 12:45 AM UTC

An outage in the our docker image registry provider (docker hub) is preventing us to scale up the egress service, causing limited availability. We are working on potential mitigations and monitoring the status of docker hub
resolved Sep 25, 2025, 01:02 AM UTC

The docker hub outage is resolved, allowing us to scale up as expected again.

Read the full incident report →

Minor September 22, 2025

SIP call outage in EU

Detected by Pingoru: Sep 22, 2025, 08:13 AM UTC
Resolved: Sep 22, 2025, 11:30 AM UTC
Duration: 3h 17m

Affected: Global SIP

Timeline · 6 updates

investigating Sep 22, 2025, 08:13 AM UTC

We are currently investigating reports of SIP calls into EU region not working
identified Sep 22, 2025, 09:53 AM UTC

The issue has been identified and a fix is being implemented.
identified Sep 22, 2025, 10:30 AM UTC

The fix has been deployed to all regions.
monitoring Sep 22, 2025, 11:07 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Sep 22, 2025, 11:30 AM UTC

The incident is resolved.
postmortem Sep 23, 2025, 04:35 AM UTC

On Friday, Sep 19, 2025 at 11.21pm UTC, we rolled out a change to our SIP gateway servers that reduced the allowed MTU size for incoming packets to 1500 bytes. While this did not affect all calls, 5% of calls where INVITE messages exceeded the SIP mandated MTU size of 1500 got dropped. These may have appeared like network issues to any affected users. The fix was to increase the MTU size limit on the SIP gateway and this resolved the issue. To prevent similar incidents in the future, we have already added to both our monitoring and our release verification suite to cover this case.

Read the full incident report →

Minor September 12, 2025

Cloud agent storage and deploy issue

Detected by Pingoru: Sep 12, 2025, 04:54 AM UTC
Resolved: Sep 12, 2025, 07:26 AM UTC
Duration: 2h 32m

Affected: Global Cloud Agents

Timeline · 3 updates

identified Sep 12, 2025, 04:54 AM UTC

We've identified an issue with cloud agent build and deployments. Fix is in progress. Note the issue only impacts new builds. Existing agents continue to work as expected.
monitoring Sep 12, 2025, 06:43 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Sep 12, 2025, 07:26 AM UTC

This incident has been resolved.

Read the full incident report →

Notice September 10, 2025

2.5% of Inbound UDP Calls Dropping

Detected by Pingoru: Sep 10, 2025, 05:54 PM UTC
Resolved: Aug 26, 2025, 05:00 AM UTC
Duration: —

Timeline · 1 update

resolved Sep 10, 2025, 05:54 PM UTC

Our SIP load balancer runs inside of Kubernetes, which internally routes packets from the virtual pod IP to the host IP. We believe a linux kernel bug related to conntrack is causing certain packets (including our 200 OK responses) to be lost in that process. This bug impacts about 2.5% of inbound calls over UDP. Since we sent the 200, but it doesn't make it to the trunking provider, our system believes the call is connected when it isn't. Since TCP retries lost packets, it is not impacted by this bug. We will be rolling out a permanent fix in the next few weeks.

Read the full incident report →

Minor September 10, 2025

Cloud Agents deploy issues

Detected by Pingoru: Sep 10, 2025, 05:09 PM UTC
Resolved: Sep 10, 2025, 07:27 PM UTC
Duration: 2h 18m

Affected: Global Cloud Agents

Timeline · 5 updates

investigating Sep 10, 2025, 05:09 PM UTC

Some cloud agent builds are having problems getting scheduled; we're investigating this issue.
identified Sep 10, 2025, 05:47 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Sep 10, 2025, 06:37 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Sep 10, 2025, 07:27 PM UTC

This incident has been resolved.
postmortem Sep 10, 2025, 07:28 PM UTC

This issue was caused by lock contention in the cloud agents deployment code path. This caused some builds to not get deployed in a timely manner. The offending lock scope has been decreased significantly which should ensure this issue doesn’t happen again. We’ve also added additional monitoring around the queue involved to ensure we are notified earlier of any similar issues.

Read the full incident report →

Notice August 26, 2025

Degradation in Egress, Ingress and SIP APIs

Detected by Pingoru: Aug 26, 2025, 04:59 PM UTC
Resolved: Aug 26, 2025, 08:09 PM UTC
Duration: 3h 10m

Affected: Global SIP

Timeline · 2 updates

monitoring Aug 26, 2025, 04:59 PM UTC

We experienced spikes of slow db queries around 1600 and 1700 UTC Aug 26, 2025 that affected the performance of the Egress, Ingress and SIP APIs in US West region . We rolled out a fix at 1734 UTC Aug 26, 2025 and are now monitoring the situation.
resolved Aug 26, 2025, 08:09 PM UTC

This incident has been resolved.

Read the full incident report →

Major August 26, 2025

transient disruption in US West and Brazil regions

Detected by Pingoru: Aug 26, 2025, 05:23 AM UTC
Resolved: Aug 26, 2025, 04:00 AM UTC
Duration: —

Timeline · 1 update

resolved Aug 26, 2025, 05:23 AM UTC

The load balancers in US West and Brazil regions became overloaded and was rejecting a large number of connections between 4:01 UTC to 4:04 UTC, lasting around 3 minutes During this time, some API requests and user connections in those regions were unavailable.

Read the full incident report →

Notice August 22, 2025

Temporary Egress Availability Issues

Detected by Pingoru: Aug 22, 2025, 04:49 PM UTC
Resolved: Aug 22, 2025, 08:00 PM UTC
Duration: 3h 10m

Timeline · 1 update

resolved Aug 22, 2025, 04:49 PM UTC

From 15:58-16:18 UTC, around 2% of StartEgress requests failed with 503 service unavailable. The root cause was an issue with our autoscaling metrics, causing our canary clusters to stop scaling. The change has been reverted and service is now back to normal.

Read the full incident report →

Minor August 15, 2025

Core RTC issue in US East

Detected by Pingoru: Aug 15, 2025, 12:45 PM UTC
Resolved: Aug 15, 2025, 06:41 PM UTC
Duration: 5h 55m

Affected: US East - Real Time Communication

Timeline · 4 updates

Read the full incident report →

Minor August 5, 2025

Global analytics processing issue detected

Detected by Pingoru: Aug 05, 2025, 09:35 AM UTC
Resolved: Aug 05, 2025, 10:28 AM UTC
Duration: 53m

Timeline · 2 updates

investigating Aug 05, 2025, 09:35 AM UTC

We have detected an issue with global analytics and are currently investigating.
resolved Aug 05, 2025, 10:28 AM UTC

The issue has been resolved. Our processing pipeline is working through the backlog and will catch up shortly.

Read the full incident report →

Notice August 2, 2025

Temporary networking disruption in US East 1

Detected by Pingoru: Aug 02, 2025, 09:41 AM UTC
Resolved: Aug 01, 2025, 04:00 PM UTC
Duration: —

Timeline · 1 update

resolved Aug 02, 2025, 09:41 AM UTC

Our alarms picked up higher than normal error rates with various RTC API calls. The disruption lasted between 16:05 to 16:09 UTC, impacting a portion of API requests in US East 1. We have root caused this incident to be due to internal networking within the data center. It has recovered within a few minutes without intervention.

Read the full incident report →

Minor July 30, 2025

Analytics processing is dealyed

Detected by Pingoru: Jul 30, 2025, 06:57 AM UTC
Resolved: Jul 30, 2025, 07:33 AM UTC
Duration: 36m

Affected: Cloud Dashboard (cloud.livekit.io)

Timeline · 3 updates

identified Jul 30, 2025, 06:57 AM UTC

The issue has been identified, we are rolling out a fix
monitoring Jul 30, 2025, 07:10 AM UTC

A fix has been deployed and we are monitoring the situation
resolved Jul 30, 2025, 07:33 AM UTC

This incident has been resolved

Read the full incident report →