DoControl incident

Investigating error rate increasing in our API

Major Resolved View vendor source →

DoControl experienced a major incident on October 14, 2021 affecting Graphql Api, lasting 4h 12m. The incident has been resolved; the full update timeline is below.

Started
Oct 14, 2021, 09:17 AM UTC
Resolved
Oct 14, 2021, 01:29 PM UTC
Duration
4h 12m
Detected by Pingoru
Oct 14, 2021, 09:17 AM UTC

Affected components

Graphql Api

Update timeline

  1. investigating Oct 14, 2021, 09:17 AM UTC

    Some of the API calls end up with 500 errors and displaying an error message from the UI.

  2. monitoring Oct 14, 2021, 11:16 AM UTC

    We are in touch with aws on this incident, they confirm it coming from US-EAST-1 region and push a fix for that. we are still monitoring to see if this issue was resolved.

  3. monitoring Oct 14, 2021, 11:21 AM UTC

    To share more information: some of the workloads were failed to process. in most cases, we was able to reprocess them but some incoming webhooks were failed to ingest to our platform.

  4. resolved Oct 14, 2021, 01:29 PM UTC

    AWS confirms that the issue is now resolved by AWS. to a summary of this event. from 3:00 am UTC till 11:00 am UTC we saw an error that infected our platform, around 25% of the incoming traffic didn't process. the rest of the errors were backfill by us.