Potential downtime
Timeline · 1 update
- investigating May 15, 2026, 11:59 AM UTC
Pingdom says we're down and the team is taking a closer look.
Redocly had 9 outages in the last 2 years totaling 277h 12m of downtime — averaging 0.4 incidents per month.
There were 9 Redocly outages since October 20, 2025 totaling 277h 12m of downtime. Each is summarised below — incident details, duration, and resolution information.
Pingdom says we're down and the team is taking a closer look.
We have identified an issue affecting project deployments and are actively working toward a resolution.
We've fixed the core issue, and are waiting for things to recover.
We've now resolved the incident. New production and preview deployments did not apply the recent changes. Already-running deployments continued to serve traffic normally, and no data was lost. After resolving the underlying issue, we automatically re-deployed all affected projects, and they now include the latest changes. No action is required on your side. We apologize for the inconvenience. Thanks for your patience.
Some people are experiencing problems with project deploys and previews. We already identified the issue and are working on the fix. Please standby for further updates.
We've fixed the core issue, and are waiting for things to recover.
We've now resolved the incident. Thanks for your patience.
We've now resolved the incident. Thanks for your patience. Root Cause The outage was caused by a loss of quorum within our primary cluster management layer. As new instances were being rotated into the cluster, the management nodes experienced a sharp increase in resource utilization. This spike in load prevented the nodes from communicating effectively, leading to the loss of a cluster leader and a subsequent breakdown in task scheduling and service discovery.
We've now resolved the incident. Thanks for your patience.
We've now resolved the incident. Thanks for your patience.
A database migration triggered a deadlock, causing temporary API latency and timeouts. The migration was aborted, and service health was restored immediately. We have updated our migration procedures to prevent recurrence.
The has been resolved. Thanks for your patience.
We've identified the root cause as an issue with a release procedure.