Orb incident

Elevated errors across API endpoints

Orb experienced a major incident on September 3, 2025 affecting Reads and Ingest and 1 more component, lasting 1h 7m. The incident has been resolved; the full update timeline is below.

Started: Sep 03, 2025, 01:28 PM UTC
Resolved: Sep 03, 2025, 02:36 PM UTC
Duration: 1h 7m
Detected by Pingoru: Sep 03, 2025, 01:28 PM UTC

Affected components

ReadsIngestWritesAnalytics

Update timeline

investigating Sep 03, 2025, 10:46 AM UTC

We are currently investigating database issues causing API errors
investigating Sep 03, 2025, 10:51 AM UTC

We are continuing to investigate this issue.
investigating Sep 03, 2025, 11:14 AM UTC

We're continuing to investigate, and are actively pursuing mitigation strategies. We apologize for the disruption and will provide status updates diligently here as we learn more.
investigating Sep 03, 2025, 11:31 AM UTC

We are continuing to see elevated errors on write endpoints. We believe we understand the root cause, and are pursuing multiple parallel mitigation strategies to resolve the incident as quickly as possible.
investigating Sep 03, 2025, 11:45 AM UTC

Although this incident is still active, we're seeing partial recovery for specific customers. We're continuing to treat this as top priority and working to mitigate the impact by running maintenance operations at our database layer.
investigating Sep 03, 2025, 11:57 AM UTC

We're seeing persistent partial recovery across writes, but a lower rate of failures persist. We believe the mitigations we've put in place are helping, but are continuing to pursue faster and more encompassing mitigation strategies. Note that the ingestion API is not failing and has not during the incident; once the incident is resolved we do not expect any data gaps with event ingestion so no retries should be necessary.
investigating Sep 03, 2025, 12:15 PM UTC

We're continuing to focus on mitigating impact. Once again, we apologize for the disruption and will publish an RCA after the incident is resolved.
investigating Sep 03, 2025, 12:32 PM UTC

We are continuing to investigate the issue.
monitoring Sep 03, 2025, 12:45 PM UTC

We're seeing broader recovery, and have seen no recent API errors since 12:38 UTC and now working to bring back async services.
monitoring Sep 03, 2025, 12:46 PM UTC

Continuing to see broader recovery - API errors have recovered.
monitoring Sep 03, 2025, 01:03 PM UTC

We're continuing to see broad recovery, and there have been no API errors since 12:38 UTC. Continuing to work to bring back async services.
monitoring Sep 03, 2025, 01:17 PM UTC

Our services are continuing to recover, including issuing any outstanding invoices and webhooks. We have not seen any elevated rate of API errors since initial recovery.
monitoring Sep 03, 2025, 01:28 PM UTC

We are continuing to work through our asynchronous work queue.
monitoring Sep 03, 2025, 01:47 PM UTC

We are continuing to work through our asynchronous work queue.
monitoring Sep 03, 2025, 02:00 PM UTC

We are nearly recovered, but will keep the incident in monitoring state until all asynchronous work is fully stable.
monitoring Sep 03, 2025, 02:18 PM UTC

We are continuing to work through our asynchronous work queue.
monitoring Sep 03, 2025, 02:32 PM UTC

We are nearing full resolution, and will continue to keep this incident updated.
resolved Sep 03, 2025, 02:36 PM UTC

This incident has been resolved. We'll continue to monitor for any disruptions, and follow up with a detailed RCA.