Cartesia Outage History

There were 11 Cartesia outages since February 9, 2026 totaling 15h 29m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.cartesia.ai

Major April 28, 2026

Voice agents unable to connect

Detected by Pingoru: Apr 28, 2026, 01:41 AM UTC
Resolved: Apr 28, 2026, 02:20 AM UTC
Duration: 38m

Affected: Voice Agents

Timeline · 3 updates

investigating Apr 28, 2026, 01:41 AM UTC

We are investigating an issue causing some voice agents are failing to reach their runtime deployments, returning a "trouble connecting to an agent" error. We will share an update as soon as we have more information.
identified Apr 28, 2026, 02:13 AM UTC

We've identified an issue with an upstream provider, and we're actively working on rolling it back.
resolved Apr 28, 2026, 02:20 AM UTC

This incident has been resolved.

Read the full incident report →

Major April 2, 2026

Elevated failure rates on Line (Voice Agent) calls

Detected by Pingoru: Apr 02, 2026, 08:07 PM UTC
Resolved: Apr 02, 2026, 09:28 PM UTC
Duration: 1h 20m

Affected: Voice Agents

Timeline · 3 updates

investigating Apr 02, 2026, 08:07 PM UTC

We are currently investigating this issue.
monitoring Apr 02, 2026, 09:22 PM UTC

An upstream infrastructure provider identified a DNS resolution issue affecting WebSocket connections to voice agent deployments. A mitigation has been put in place and failure rates have recovered fully. We are continuing to monitor.
resolved Apr 02, 2026, 09:28 PM UTC

This incident has been resolved.

Read the full incident report →

Notice April 1, 2026

Degradation in TTS API

Detected by Pingoru: Apr 01, 2026, 05:04 PM UTC
Resolved: Apr 01, 2026, 04:00 PM UTC
Duration: —

Timeline · 1 update

resolved Apr 01, 2026, 05:04 PM UTC

Our application was deployed on a node that had falsely configured networking setup, resulted in degradation of our TTS service in the US. Customers may see elevated connection errors and false rate limiting errors. The incident happened between 9:04AM and 9:08AM PT.

Read the full incident report →

Major March 13, 2026

Voice Agents — Degraded Performance

Detected by Pingoru: Mar 13, 2026, 02:02 PM UTC
Resolved: Mar 13, 2026, 08:45 PM UTC
Duration: 6h 42m

Affected: Voice Agents

Timeline · 3 updates

investigating Mar 13, 2026, 11:41 PM UTC

We are experiencing intermittent connectivity issues affecting voice agent sessions due to degradation at an upstream infrastructure provider. Some calls may fail to connect or drop unexpectedly.
identified Mar 13, 2026, 11:41 PM UTC

A mitigation and additional observability have been deployed. We are actively monitoring for recovery and are seeing connections recover.
resolved Mar 13, 2026, 11:42 PM UTC

This incident has been resolved.

Read the full incident report →

Minor March 10, 2026

Authentication Services Degraded on play.cartesia.ai

Detected by Pingoru: Mar 10, 2026, 04:45 PM UTC
Resolved: Mar 10, 2026, 05:49 PM UTC
Duration: 1h 3m

Affected: Playground

Timeline · 2 updates

monitoring Mar 10, 2026, 04:45 PM UTC

We're currently experiencing degraded login functionality on play.cartesia.ai due to an issue with our authentication provider, Clerk. We're monitoring the situation and you can also track their status at: https://status.clerk.com/
resolved Mar 10, 2026, 05:49 PM UTC

This incident has been resolved by Clerk.

Read the full incident report →

Notice March 10, 2026

Log ingestion disruption for Line agent calls

Detected by Pingoru: Mar 10, 2026, 04:00 PM UTC
Resolved: Mar 10, 2026, 04:00 PM UTC
Duration: —

Timeline · 1 update

resolved Mar 11, 2026, 05:26 AM UTC

Duration: March 10, 2026, 9:00 AM PST - 9:14 PM PST Impact: Runtime logs for Line agent calls made during this time period were not available via the dashboard or API. Call processing was not affected. Resolution: Our engineering team identified the root cause as an upstream breaking change from an infrastructure provider that disrupted our log ingestion pipeline. A hotfix was deployed and log collection has been fully restored. All logs from the affected window have also now been recovered and backfilled. Next Steps: We are conducting a postmortem to improve reliability and detection speed moving forward.

Read the full incident report →

Minor March 5, 2026

TTS API Elevated TTFA

Detected by Pingoru: Mar 05, 2026, 05:40 AM UTC
Resolved: Mar 05, 2026, 06:01 AM UTC
Duration: 21m

Affected: Text to Speech (APAC)

Timeline · 4 updates

investigating Mar 05, 2026, 04:55 AM UTC

We are currently investigating elevated Time to First Audio (TTFA) for api.cartesia.ai in the APAC region. Customers may be experiencing longer-than-expected delays before audio begins streaming.
identified Mar 05, 2026, 05:40 AM UTC

We have identified the issue and patched a possible fix. We are continuing to monitor the issue.
resolved Mar 05, 2026, 06:01 AM UTC

The issue has been resolved
postmortem Mar 10, 2026, 11:56 PM UTC

## Overview On March 4, 2026, our platform experienced a period of degraded performance and a brief service interruption, primarily impacting users in the APAC region. The incident was triggered by an update to our Load Balancing layer that inadvertently extended to the cache synchronization workers across several global regions. This synchronization initiated a high-volume data retrieval process from our primary database. The resulting resource contention coincided with peak organic traffic, leading to increased latency and, eventually, a necessary database restart to restore system stability. ## Incident Timeline \(PST\) - March 4th, 2026 * **15:50 – 16:50:** A new version deployment triggered a phased restart of cache synchronization workers across global regions. * **18:30:** Coinciding with peak regional demand, the primary database experienced elevated CPU utilization. Users began noticing increased Time to First Audio \(TTFA\). * **19:45:** Database resources reached a critical threshold, leading to broader performance degradation. * **20:18:** Engineering teams initiated an investigation into database health. * **21:00:** To clear queued transactions and stabilize the environment, a manual reset of the primary database was performed, resulting in a brief total service interruption. * **21:04:** Database services were successfully restored. * **21:08:** Latency metrics returned to normal levels. * **21:12:** All background cache synchronization tasks successfully completed. ## Root Cause Analysis ### Resource Contention and Buffer Saturation The primary driver of this incident was the simultaneous "cold start" of cache workers across multiple regions. By design, these workers attempt to populate the rapid-access cache by performing a comprehensive data fetch from the primary database. Because several regions initiated this process concurrently, the database was subjected to multiple long-running read operations. This created a high demand for the database’s buffer pool. When peak organic traffic arrived, the database struggled to balance these synchronization read requests with the high-frequency write operations required by active user sessions. ### Latent Dependency A recent configuration update introduced an unexpected synchronous database look-up during the request resolution path. While typically negligible, this added DB query time became significant when the DB was under heavy load. This dependency meant that as database performance slowed, user-facing latency increased proportionally, eventually leading to a backlog of requests. ## Lessons Learned and Resolution ### Architectural Optimization * **Decoupled Synchronization:** We are re-evaluating our cache synchronization process to make it future proof as our load grows. * **Improving our global database architecture:** We are implementing a global database architectural change to make our system more resilient. ### Enhanced Monitoring We are deploying a unified health dashboard to provide real-time visibility into: * Granular resolution request hot path latency. * More aggressive alerting on DB / Cache health.

Read the full incident report →

Minor February 26, 2026

Degraded generation quality on public voices

Detected by Pingoru: Feb 26, 2026, 08:45 PM UTC
Resolved: Feb 26, 2026, 11:20 PM UTC
Duration: 2h 34m

Affected: Text to Speech (US)Text to Speech (EU)Text to Speech (APAC)

Timeline · 7 updates

identified Feb 26, 2026, 09:06 PM UTC

We have identified a degradation in speech generation quality for requests using public voices since 6:30 PM PST on February 25.
identified Feb 26, 2026, 09:08 PM UTC

We are continuing to work on a fix for this issue.
identified Feb 26, 2026, 10:14 PM UTC

We are continuing to work on a fix for this issue. A resolution is expected in 1 hour.
monitoring Feb 26, 2026, 10:44 PM UTC

We have rolled out a fix globally that should slowly take place over the next 10-15 minutes.
monitoring Feb 26, 2026, 11:11 PM UTC

We confirmed that the fix had been rolled out and TTS generation quality should resume to normal. We are continuing to monitor the situation to ensure that the incident had been resolved.
resolved Feb 26, 2026, 11:20 PM UTC

The incident has been resolved
postmortem Feb 28, 2026, 02:18 AM UTC

# Overview * During a maintenance upgrade to Cartesia’s voice metadata, existing metadata for a narrow subset of default Cartesia voices was unintentionally overwritten, causing voice changes. * Some voices were affected more than others. A narrow subsection of voices saw increased hallucinations while 3 voices were significantly impacted. * Degradation occurred gradually as caches began expiring and was mitigated after a database restore and global cache purge. * In addition to fixing the root cause, we are updating our change management and monitoring process to prevent issues like this in the future by enforcing data upgrade safety via tooling, improving automatic detection, updating our triage process, and investing in a lower recovery time objective \(RTO\) for critical data like voices. # Detailed Analysis ## Timeline \(UTC\) * **2026-02-25 04:57** – Upgrade completes, overwritten voice metadata starts being served as caches expire. We regularly test our model for regression, but since only a small subsection of voices were affected, they were not caught by our automated testing in this case. * **2026-02-25 23:00 - 2026-02-26 17:00** – We receive reports of voice changes and begin internal investigation to assess scope and severity. The on-call investigated but made an incorrect determination of the source and severity of the issue which delayed remediation. * **2026-02-26 18:40** – Impact triaged to high severity. * **2026-02-26 19:27** – Root cause identified; mitigation plan begins. * **2026-02-26 22:13** – Data fully restored; global cache purge begins. * **2026-02-26 23:06** – Global cache purge completes. Customers begin confirming resolution. * **2026-02-26 23:20** – Status page marked resolved ## Root cause A bug in the code executing a maintenance upgrade to Cartesia’s voice metadata caused the metadata for some existing voices to be regenerated and overwritten. This metadata is fundamental to how we represent voices and changes to this metadata can lead to changes in voice output. The code path containing the bug was executed because some default voices did not fulfill an invariant of the metadata upgrade process and were incorrectly identified as requiring an upgrade. The issue was not caught in manual or automated testing, because because it only reproduces in a specific state that exists in production and affects a narrow subsection of voices. ## Learnings and Next Steps We sincerely apologize for this incident and the disruption it caused to your business. We understand that reliable voice quality is fundamental to your trust in Cartesia. We are making the following corrections to our change management process and monitoring process to prevent issues like this in the future, targeting every step of the release and error recovery lifecycle: 1. **Safer processes for routine data changes, enforced via tooling:** We are investing in automated tooling to make our existing change management process even more thorough. We will incorporate additional automated tooling to triage the risk of changes, do dry runs on realistic data, and ensure multiple sign-offs beyond code review with automatic rollbacks in case of errors. 2. **Improved automatic detection**: We are expanding our automatic voice regression testing to expand coverage to much larger set of voices and transcripts. 3. **Update triage process:** The issue was incorrectly triaged. We will update our triage playbook to ensure that any reported voice issues are comprehensively checked against recent code and data changes to speed up an RCA and determine scope. This will ensure future issues are correctly escalated faster. 4. **Invest in lower RTO:** Our RTO for our voices infra is currently at 4 hours. However, for critical data like voices, we will invest in bringing down our RTO to 15 minutes. We will implement automation around finer grained data snapshots and data restoration that are currently manual or slow. If you have any questions or concerns about this incident, please don't hesitate to reach out to our support team.

Read the full incident report →

Minor February 19, 2026

sonic-3-latest TTS generations using pronunciation dictionaries produce incorrect results

Detected by Pingoru: Feb 19, 2026, 06:21 PM UTC
Resolved: Feb 19, 2026, 06:45 PM UTC
Duration: 23m

Affected: Text to Speech (US)Text to Speech (EU)Text to Speech (APAC)

Timeline · 3 updates

investigating Feb 19, 2026, 06:21 PM UTC

TTS generations using pronunciation dictionaries produce incorrect audio (some structured metadata is read out in the speech). Workaround for affected customers: use `sonic-3` instead of `sonic-3-latest`.
investigating Feb 19, 2026, 06:45 PM UTC

We have rolled back the bad deploy and pronunciation dictionaries are once again working with `sonic-3-latest`.
resolved Feb 19, 2026, 06:45 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 19, 2026

Playground signin instability

Detected by Pingoru: Feb 19, 2026, 05:12 PM UTC
Resolved: Feb 19, 2026, 06:41 PM UTC
Duration: 1h 28m

Affected: Playground

Timeline · 3 updates

investigating Feb 19, 2026, 05:12 PM UTC

Some users are having trouble signing in; some users are experiencing latency in signin. Our provider Clerk has an incident page here: https://status.clerk.com/ API and API authorization is unaffected; this is only on play.cartesia.ai and docs.cartesia.ai
monitoring Feb 19, 2026, 06:00 PM UTC

Our provider has rolled out a fix and we are monitoring the recovery.
resolved Feb 19, 2026, 06:41 PM UTC

This incident has been resolved.

Read the full incident report →

Major February 9, 2026

TTS API degraded in APAC for api.cartesia.ai and api-india.cartesia.ai

Detected by Pingoru: Feb 09, 2026, 05:05 PM UTC
Resolved: Feb 09, 2026, 06:00 PM UTC
Duration: 55m

Affected: Text to Speech (APAC)

Timeline · 3 updates

investigating Feb 09, 2026, 05:16 PM UTC

We've noticed errors when generating audio on `/bytes` and `/sse` endpoints in India. We are currently investigating this issue.
monitoring Feb 09, 2026, 05:55 PM UTC

We've identified and mitigated a DDoS attack on our APAC infrastructure. Traffic protection measures are now in place and routing has been restored. We're continuing to monitor the situation to ensure stability. Error rates have been normalized.
resolved Feb 09, 2026, 06:00 PM UTC

This incident has been resolved.

Read the full incident report →

Looking to track Cartesia downtime and outages?

Pingoru polls Cartesia's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Cartesia reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Cartesia alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Cartesia for free

5 free monitors · No credit card required