DriveWealth incident

API Error Rate

Critical Resolved View vendor source →

DriveWealth experienced a critical incident on June 13, 2023, lasting 2d. The incident has been resolved; the full update timeline is below.

Started
Jun 13, 2023, 08:12 PM UTC
Resolved
Jun 15, 2023, 08:13 PM UTC
Duration
2d
Detected by Pingoru
Jun 13, 2023, 08:12 PM UTC

Update timeline

  1. investigating Jun 13, 2023, 07:21 PM UTC

    AWS is experiencing outages in certain services causing impacts in our API and Drivehub. We are currently working with AWS to investigate the impact.

  2. investigating Jun 13, 2023, 07:32 PM UTC

    We are continuing to investigate this issue.

  3. identified Jun 13, 2023, 07:40 PM UTC

    Update from AWS: AWS has identified the root cause of the elevated errors invoking AWS Lambda functions, and they are actively working to resolve this issue.

  4. identified Jun 13, 2023, 08:12 PM UTC

    Update From AWS: We are continuing to experience increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. We have identified the root cause as an issue with AWS Lambda, and are actively working toward resolution.We are actively working on full mitigation and will continue to provide regular updates. Source: https://health.aws.amazon.com/health/status

  5. identified Jun 13, 2023, 08:31 PM UTC

    While the AWS outage is ongoing and their technicians are still investigating the issues, DW has failed over to form a workaround for selected services. Orders and most APIs should be functional at this time

  6. identified Jun 13, 2023, 08:57 PM UTC

    Update from AWS - Beginning at 11:49 AM PDT, customers began experiencing errors and latencies with multiple AWS services in the US-EAST-1 Region. Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use by other AWS services. We have associated other services that are impacted by this issue to this post on the Health Dashboard. Additionally, customers may experience authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS. Customers may also experience intermittent issues when attempting to call or initiate a chat to AWS Support. We are now observing sustained recovery of the Lambda invoke error rates, and recovery of other affected AWS services. We are continuing to monitor closely as we work towards full recovery across all services.

  7. identified Jun 13, 2023, 09:38 PM UTC

    While AWS services are not fully restored, we are back to full operations using our workaround solutions. As soon as we receive a greenlight from AWS that all services are up and running, we will seamlessly revert back to standard configuration.

  8. identified Jun 13, 2023, 10:12 PM UTC

    Update - AWS has communicated an ETA of around one hour for full restoration of services. Upon full AWS restoration of services, we will seamlessly revert our services back to standard mode

  9. monitoring Jun 13, 2023, 10:31 PM UTC

    A fix has been implemented and we are monitoring the results.

  10. resolved Jun 15, 2023, 08:13 PM UTC

    This incident has been resolved.