SMRTs incident

Amazon AWS Outage

Notice Resolved View vendor source →

SMRTs experienced a notice incident on November 25, 2020 affecting POS, lasting 18h 45m. The incident has been resolved; the full update timeline is below.

Started
Nov 25, 2020, 03:37 PM UTC
Resolved
Nov 26, 2020, 10:22 AM UTC
Duration
18h 45m
Detected by Pingoru
Nov 25, 2020, 03:37 PM UTC

Affected components

POS

Update timeline

  1. identified Nov 25, 2020, 03:37 PM UTC

    We're investigating elevated reports loading SMRT. It appears our datacenter provider, AWS, is having intermittent issues.

  2. identified Nov 25, 2020, 03:55 PM UTC

    We are continuing to work with Amazon Web Services to resolve this issue. It appears to be a global outage with their infrastructure: https://downdetector.com/status/amazon/

  3. identified Nov 25, 2020, 05:05 PM UTC

    We're still working on resolving these issues, unfortunately all 3 of our AWS datacenter locations are down in us-east-1 (Virginia).

  4. identified Nov 25, 2020, 06:08 PM UTC

    AWS is still experiencing a severe outage that's affecting many companies including some big names like Roku, Adobe, and Ring to name a few. Here's an article explaining the outage. Amazon has not given a clear answer as to what caused the issue. https://techcrunch.com/2020/11/25/amazon-web-services-outage-takes-a-portion-of-the-internet-down-with-it/

  5. identified Nov 25, 2020, 07:45 PM UTC

    Latest from Amazon: "We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. For Kinesis Data Streams, the issue is affecting the subsystem that is responsible for handling incoming requests. The team has identified the root cause and is working on resolving the issue affecting this subsystem. The issue also affects other services, or parts of these services, that utilize Kinesis Data Streams within their workflows. While features of multiple services are impacted, some services have seen broader impact and service-specific impact details are below."

  6. monitoring Nov 25, 2020, 08:31 PM UTC

    The system is back up but with degraded performance. We're working with AWS to get us back to normal.

  7. monitoring Nov 25, 2020, 11:44 PM UTC

    AWS Outage Update: While the system is back online Amazon Web Services is still experiences issues but is on the mend. Expect slower performance than usual for the rest of the day and reports to take longer than normal to update. Here's the latest from Amazon: "We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. We also continue to see an improvement in error rates for Kinesis and several affected services, but expect full recovery to still take up to a few hours. For Amazon Cognito, the issues affecting APIs and authentication for user and identity pools has now recovered. For AutoScaling, delays in launching new instances has now recovered, however some scaling operations are still delayed due to delayed CloudWatch metrics. For EventBridge, we have seen partial recovery for the issue affecting delivery of Events. We are actively working toward full recovery for all affected services, and will continue to provide updates regularly as we have new information to share."

  8. monitoring Nov 26, 2020, 07:03 AM UTC

    AWS Outage Update: The latest from Amazon: "We have restored all traffic to Kinesis Data Streams from Internet-facing endpoints, and we are continuing to incrementally restore all requests to Kinesis Data Streams using VPC Endpoints. We are also beginning to observe the incremental recovery of CloudWatch metrics functionality for new incoming metrics, and working towards full recovery. The backlog of metrics will take additional time to populate. We will continue to keep you updated on our progress." SMRT knows how hard today was and we thank you for your paitience! This was the worst AWS issue since at least 2017. Interestingly enough we had one customer demoing an offline inventory toll and they were able to handle pickups throughout the outage. This update will be available to all SMRT customers before the new year.

  9. resolved Nov 26, 2020, 10:22 AM UTC

    This incident has been resolved.