Amazon Marketplace Web Service incident

Amazon Elastic Compute Cloud (EC2) — us-east-1: Service impact: Increased Error Rate and Latency

Amazon Marketplace Web Service is currently experiencing a major incident affecting Amazon Elastic Compute Cloud (EC2) — us-east-1, which began 1d ago. The vendor's full update timeline is below.

Started
May 08, 2026, 12:25 AM UTC
Resolved
Ongoing
Duration
● 1d 2h
Detected by Pingoru
May 08, 2026, 12:25 AM UTC

Affected components

Amazon Elastic Compute Cloud (EC2) — us-east-1

Update timeline

  1. monitoring May 08, 2026, 12:25 AM UTC

    We are investigating instance impairments in a single Availability Zone (use1-az4) in the US-EAST-1 Region. Other Availability Zones are not affected by the event and we are working to resolve the issue.

  2. monitoring May 08, 2026, 12:53 AM UTC

    We continue to investigate instance impairments to a single Availability Zone (use1-az4) in the US-EAST-1 Region. We have experienced an increase in temperatures within a single data center, which in some cases has caused impairments for instances in the Availability Zone. EC2 instances and EBS volumes that were hosted on hardware that has been affected by the loss of power during the thermal event. Other AWS services that depend on the affected EC2 instances and EBS volumes in this Availability Zone, may also experience impairments. We will continue to provide updates as recovery continues.

  3. monitoring May 08, 2026, 01:47 AM UTC

    We continue to work towards mitigating the increased temperatures to its normal levels in the affected Availability Zone (use1-az4) in the US-EAST-1 Region. Other AWS services that depend on the affected EC2 instances and EBS volumes in this Availability Zone, may also experience impairments. We have weighed away traffic for most services at this time. We recommend customers utilize one of the other Availability Zones in the US-EAST-1 Region at this time, as existing instances in other AZ's remain unaffected by this issue. Customers may experience longer than usual provisioning times. We will provide an update by 7:45 PM PDT, or sooner if we have additional information to share.

  4. monitoring May 08, 2026, 03:06 AM UTC

    We are actively working to restore temperatures to normal levels in the affected Availability Zone (use1-az4) in the US-EAST-1 Region, though progress is slower than originally anticipated. Since our last update we have made incremental progress to restore cooling systems within the affected AZ, which will not be visible to external customers but are required for the restoration of affected services. In the impacted Availability Zone, EC2 Instances, EBS Volumes, and other AWS Services are also experiencing elevated error rates and latencies for some workflows. As part of our recovery effort, we have shifted traffic away from the impacted Availability Zone for most services. We recommend customers utilize one of the other Availability Zones in the US-EAST-1 Region, as existing instances in other AZs remain unaffected by this issue. If immediate recovery is required, we recommend customers restore from EBS Snapshots and/or replace affected resources by launching new replacement resources in one of the unaffected zones. We will provide an update by 10:00 PM PDT, or sooner if we have additional information to share.

  5. monitoring May 08, 2026, 05:11 AM UTC

    We are observing early signs of recovery. We continue to work towards restoring temperatures to normal levels and bring impacted racks back online in the affected Availability Zone (use1-az4) in the US-EAST-1 Region. We have been able to get additional cooling system capacity online, which has allowed us to recover some affected racks and are actively working to recover additional racks in a controlled and safe manner. In the impacted Availability Zone, EC2 Instances, EBS Volumes, and other AWS Services may continue to experience elevated error rates and latencies for some workflows until full recovery is achieved. We will provide an update by 11:30 PM PDT, or sooner if we have additional information to share.

  6. monitoring May 08, 2026, 06:38 AM UTC

    We continue to make progress in resolving the impaired EC2 instances in the affected Availability Zone (use1-az4) in the US-EAST-1 Region, and are working towards full recovery. We are actively working to bring additional cooling system capacity online, which will enable us to recover the remaining affected racks in a controlled and safe manner. In the impacted Availability Zone, EC2 Instances, EBS Volumes, and other AWS Services may continue to experience elevated error rates and latencies for some workflows. Customers will continue to see some of their affected EC2 instances and EBS volumes as impaired until we achieve full recovery. We will provide an update by May 8, 1:30 AM PDT, or sooner if we have additional information to share.

  7. monitoring May 08, 2026, 08:32 AM UTC

    Mitigation efforts remain underway to resolve the impaired EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region. These EC2 instances and EBS volumes were impacted due to a loss of power during the thermal event. The work to bring additional cooling system capacity online, which will enable us to recover the remaining affected infrastructure in a controlled and safe manner, is taking longer than we had initially anticipated. Some services, such as IoT Core, ELB, NAT Gateway, and Redshift, have seen significant improvements in the recovery of their workflows. However, some customers will continue to see their affected EC2 instances and EBS volumes as impaired until we achieve full recovery. While we do not currently have an ETA for full recovery, we are prioritizing this issue and will provide another update by 3:30 AM PDT or sooner if additional information becomes available.

  8. monitoring May 08, 2026, 10:54 AM UTC

    We continue to make progress towards resolving the impaired EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region. At this time, we wanted to provide some more details on the issue. Beginning on May 7 at 4:20 PM PDT, we began experiencing an increase in instance impairments within the affected zone due to the loss of power during a thermal event. Engineers were automatically engaged within minutes and immediately began investigating multiple mitigations. By 9:12 PM PDT, we restored power to a subset of the affected infrastructure and observed some signs of recovery, which have remained stable. We continue working to bring additional cooling system capacity online, which will enable us to recover the remaining affected hardware in the impacted zone in a controlled and safe manner. Some AWS services, such as IoT Core, ELB, NAT Gateway, and Redshift, continue to see significant improvements in the recovery of their workflows. However, some customers will continue to see their affected EC2 instances and EBS volumes as impaired until we achieve full recovery. If immediate recovery is required, we recommend customers restore from EBS snapshots and/or replace affected resources by launching new replacement resources in one of the unaffected zones. Based on our current mitigation efforts, we expect full recovery to take several hours. We are prioritizing this issue and will provide another update by 6:30 AM PDT or sooner if additional information becomes available.

  9. monitoring May 08, 2026, 01:51 PM UTC

    We continue working to resolve the impaired EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region caused by a thermal event. During such an event, servers automatically shut down when the temperatures exceeded the operating thresholds in order to protect the hardware. We are actively working to bring additional cooling system capacity online, which will enable us to recover the remaining affected hardware in the impacted zone. Some customers will continue to see their affected EC2 instances and EBS volumes as impaired until we achieve full recovery. If immediate recovery is required, we recommend customers restore from EBS snapshots and/or replace affected resources by launching new replacement resources in one of the unaffected zones. In parallel, we are investigating increased error rates and query failures for Redshift clusters in the US-EAST-1 Region. During this time, affected customers may see errors for resume and restart workflows, as well as failover operations and availability issues. Our engineers are actively working to resolve this issue. Full recovery is still expected to take several hours. We are prioritizing this issue and will provide another update by 9:00 AM PDT or sooner if additional information becomes available.

  10. monitoring May 08, 2026, 03:58 PM UTC

    We continue our efforts to work towards the recovery of the impaired EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region. We are making progress towards the restoration of the cooling system capacity that is required to recover the affected hardware in the impacted zone. Some customers will continue to see their affected EC2 instances and EBS volumes as impaired until the affected racks are recovered. We continue to recommend that customers who require immediate recovery restore from EBS snapshots and/or replace affected resources by launching new replacement resources in one of the unaffected zones. As part of our parallel investigation, we have identified the root cause of the increased error rates and query failures for Redshift clusters in the US-EAST-1 Region. This has been confirmed to be related to impact from an upstream dependency. Affected customers may continue to see errors for resume and restart workflows, failover operations, and impact to general availability. We are actively working to resolve the issue. Our timeline for full recovery is still expected to take several hours and will be incremental as we bring racks online in phases. We will provide an additional update by 12:30 PM or sooner if we have new information to provide.

  11. resolved May 08, 2026, 04:30 PM UTC

    We have observed complete recovery of increased error rates and query failures for Redshift clusters in the US-EAST-1 Region. We were able to resolve the impact independently of the ongoing efforts to recover the affected hardware in the use1-az4 Availability Zone. The issue affecting Redshift has been resolved and the service is operating normally. We will provide an additional update regarding the efforts towards hardware restoration by 12:30 PM or sooner.

  12. monitoring May 08, 2026, 06:12 PM UTC

    We are experiencing an increase in timeouts to Amazon Managed Streaming for Apache Kafka partitions on a subset of clusters as a result of the ongoing issue in a single Availability Zone (use1-az4) in the US-EAST-1 Region. We are working in parallel to determine a path towards mitigation for affected clusters. We will provide an additional update by 12:30 PM or sooner.

  13. monitoring May 08, 2026, 07:29 PM UTC

    We continue to work towards the recovery of the impaired EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region though efforts are slower than we had previously anticipated. We are taking measured steps to ensure that cooling capacity is brought online in a safe and controlled manner. As a result, EBS Volumes and EC2 instances affected by the issue will continue to experience impairments. We continue to recommend that customers who require immediate recovery restore from EBS snapshots and/or replace affected resourced by launching new replacement resources. Full recovery is still expected to take several hours. We will provide an additional update by 4:00 PM or sooner if we have new information to provide.

  14. monitoring May 08, 2026, 11:00 PM UTC

    We have begun to see improvements in the overall number of affected EC2 instances and degraded EBS volumes in a single Availability Zone (use1-az4) in the US-EAST-1 Region. The steps taken to supply additional cooling capacity have been showing steady signs of progress. Some EBS Volumes and EC2 instances affected by the issue will continue to experience impairments while we continue to drive these efforts. We continue to recommend that customers who require immediate recovery restore from EBS snapshots and/or replace affected resources by launching new replacement resources. In parallel, we have seen some improvements in Amazon Managed Streaming for Apache Kafka as a result of the parallel mitigation efforts being performed. We are still experiencing timeouts to partitions but are seeing continued progress. We do anticipate that recovery will still take several hours. We will provide an additional update by 7:30 PM or sooner if we have new information to provide.