DoControl incident
Investigating error rate increasing in our API
DoControl experienced a major incident on October 14, 2021 affecting Graphql Api, lasting 4h 12m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 14, 2021, 09:17 AM UTC
Some of the API calls end up with 500 errors and displaying an error message from the UI.
- monitoring Oct 14, 2021, 11:16 AM UTC
We are in touch with aws on this incident, they confirm it coming from US-EAST-1 region and push a fix for that. we are still monitoring to see if this issue was resolved.
- monitoring Oct 14, 2021, 11:21 AM UTC
To share more information: some of the workloads were failed to process. in most cases, we was able to reprocess them but some incoming webhooks were failed to ingest to our platform.
- resolved Oct 14, 2021, 01:29 PM UTC
AWS confirms that the issue is now resolved by AWS. to a summary of this event. from 3:00 am UTC till 11:00 am UTC we saw an error that infected our platform, around 25% of the incoming traffic didn't process. the rest of the errors were backfill by us.