Transient errors across API endpoints due to database failover
Timeline · 1 update
- resolved Jun 18, 2026, 01:09 AM UTC
We observed a brief (< 1 minute) period of API errors due to a database failover. This incident has been resolved.
Orb had 19 outages in the last 2 years totaling 57h 41m of downtime — averaging 0.8 incidents per month.
There were 19 Orb outages since July 3, 2025 totaling 57h 41m of downtime. Each is summarised below — incident details, duration, and resolution information.
We observed a brief (< 1 minute) period of API errors due to a database failover. This incident has been resolved.
We observed processing delays and queueing between 06/07/26 00:00 UTC and 01:00 UTC, with full recovery to established baselines at 01:30 UTC. We believe this will not recur, but will continue to monitor and increase capacity to prevent this in the future. Customers with dedicated webhook SLAs and provisioning were not impacted.
We have identified and are actively fixing an issue with usage processing for a subset of data, affecting alerting and usage-based invoice issuance. APIs and data ingestion remains operational.
The issue has been identified and a fix is being implemented.
A fix has been implemented and we are monitoring as latency is resolved in async services
This incident has been resolved.
We're investigating elevated errors rates across API endpoints.
We're seeing recovery as of 12:33 AM UTC, and are continuing to monitor for impact.
Errors have continued to stay mitigated as of 00:33 UTC (approximately 40 mins ago).
We're working to mitigate an infrastructure issue, which may lead to intermittment latency spikes (each of which should last a few seconds), resulting in a higher rate of client-side retries. We apologize in advance for the disruption, and we're working to resolve the situation.
This incident has been resolved.
We identified elevated latencies for fetching usage (and potentially some associated actions that required invoicing via manual action). The vast majority of the impact was from April 3 21:22 to 21:26 UTC. Impact was fully mitigated for a remaining small (<1%) of errors by 21:39.
This is now resolved.
Following a deployment at 5:40 UTC, the Orb API started experience elevated timeouts for applying and cancelling subscription pending changes. We have rolled back and errors have subsided fully as of 12:55 UTC, and we are continuing to monitor.
We identified the root cause to be a new query that was introduced. API traffic to the affected endpoints has been healthy since the rollback at 12:55 UTC.
We're seeing some async delays on invoicing, webhooks, and alerts. APIs are not impacted at this time.
We've confirmed the source of the issue and are working on a fix.
We are continuing to work on a fix for this issue.
We have applied a fix and are monitoring recovery.
The vast majority of async workloads have caught up, and we will continue to monitor our services over the next few hours.
From 02/26/2026 02:16 - 02/26/2026 04:16 UTC there was a delay in event ingestion from our scheduled maintenance. This disruption may have caused delays in alerting, threshold invoices, and top-up blocks from being issued. No data-loss was experienced.
We’re currently experiencing some lag on the web application and are actively investigating the cause.
This incident has been resolved.
We observed an increase in page load failures in the Orb dashboard and a subset of APIs (5%) starting at 02:25 UTC for approximately 10 minutes. Services are now recovered, and we are continuing to monitor. We will provide more detailed updates to affected customers. Data ingestion was not impacted.
We have not seen any new related errors, and this incident is now resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to monitor for any further issues.
This incident has been resolved.
We've identified and are investigating an issue with invoice issuance delays. No data or API impact at this time.
We have applied a fix and will continue to monitor invoice issuance.
This incident has been resolved.
We are currently investigating intermittent webapp load failures for some customers. API, ingestion, and invoicing are unaffected.
We've implemented a fix and are working on rolling it out.
This incident has been resolved.
We're looking into elevated API times on a subset of ingestion API requests. This does not affect any ingestion via S3.
We've identified the issue, and API latency is recovering - we're continuing to monitor.
Continuing to see quick recovery on API latencies. We'll provide any further updates in 10 minutes, or close out this incident.
API latencies have returned to normal, and services are fully recovered.
We've identified some revenue reporting delays in our pipeline; billing and data ingestion are not affected. Values from the API are also not affected. We expect to catch up within 24 hours, and will continue providing updates here.
As of 22:00 UTC, our revenue reporting service has caught up and is now processing data as expected.
A recent code deploy caused a minor increase in analytics request errors. Our automated canary systems rolled this back as part of our deploy process without any operator intervention. We're continuing to monitor to ensure errors do not recur, and the offending logic is not deployed. This did not impact writes or data ingestion.
We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.
Orb is affected by a cloud provider incident and this is affecting our asynchronous workload. Currently this is not affecting APIs or data ingestion, but we're continuing to monitor in partnership with our service provider.
We're seeing recovery and are continuing to monitor.
We're not seeing any continued impact on our services.