Chargebee incident
Intermittent API Request Timeouts - US Region
Chargebee experienced a minor incident on May 8, 2026 affecting API (US), lasting 7h 6m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring May 08, 2026, 12:29 AM UTC
Between May 7, 23:35 UTC and May 8, 00:55 UTC, we experienced intermittent API request timeouts in the US region due to an AWS platform issue. Overall impact was minimal. The service has fully recovered, and we have observed no errors for the past 5 minutes. We are monitoring the systems to ensure continued stability.
- monitoring May 08, 2026, 12:57 AM UTC
Our team is working with AWS to assess the impact. We will share an update as we progress.
- monitoring May 08, 2026, 04:08 AM UTC
We are continuing to monitor the services. The incident remains open as we verify full system recovery. We will share an update as we progress.
- resolved May 08, 2026, 07:36 AM UTC
We have verified that the intermittent API latency has been fully resolved. Our monitoring confirms all systems have fully recovered.
- postmortem May 15, 2026, 05:08 PM UTC
**Post-Mortem: Intermittent API Errors and Background Processing Delays** **Incident Overview:** On May 7, 2026, an infrastructure event in [AWS US-EAST-1 ](https://health.aws.amazon.com/health/status?eventID=arn:aws:health:us-east-1::event/EC2/AWS_EC2_OPERATIONAL_ISSUE/AWS_EC2_OPERATIONAL_ISSUE_FFA94_A4DC515D30F)\(availability zone use1-az4\) caused degraded performance across multiple AWS services, including Amazon DynamoDB. Between 11:33 PM UTC on May 7 and 04:50 AM UTC on May 8, our platform experienced elevated API error rates and increased latency as a result. Approximately 1.5% of active sites were affected, experiencing intermittent API failures and delays during this window. Total incident duration was 5 hours and 17 minutes. **Incident Timelines:** * **11:20 PM UTC, May 7 — AWS infrastructure event begins.** AWS identified an infrastructure impairment in use1-az4 caused by a thermal event affecting EC2 and EBS. Multiple dependent AWS services were impacted. * **11:30 PM UTC, May 7 — DynamoDB degradation begins.** DynamoDB began experiencing elevated API latency and read/write errors in US-EAST-1 as a subset of storage partitions became impaired. * **11:33 PM UTC, May 7 — Detection and incident response activated.** Our anomaly alerting fired and on-call engineers were paged within two minutes. We began observing elevated 5xx error rates and increased request latency across affected API surfaces. Background job execution was delayed for a subset of sites. Our team opened direct communication with AWS and began implementing mitigations to stabilise platform performance alongside our automated recovery. * **~2:00 AM UTC, May 8 — Premature AWS stabilisation signal:** AWS reported initial DynamoDB service stabilisation. Our monitoring continued to show intermittent latency spikes and request failures, indicating recovery was incomplete. We remained on active incident watch and continued coordination with AWS. * **04:50 AM UTC, May 8 — Platform stabilisation confirmed** API error rates and request latency returned to normal operating levels across our platform. All background jobs that were delayed or failed during the incident were successfully rescheduled and completed. Core functionality was fully restored, and the incident was declared resolved for our workloads. * **02:00 AM UTC, May 9 — Full AWS recovery confirmed.** AWS confirmed complete DynamoDB partition recovery across all affected infrastructure in US-EAST-1, followed by full restoration of the underlying EC2/EBS layer. We ensured all the failed jobs were rescheduled successfully. We recognise that even a short disruption has real consequences for your operations. We remain committed to continuously improving our system’s resilience to reduce the impact on the Chargebee platform from external events.