Red Hat 3scale incident

Cascading failures for ROSA and OSD services that depend on Quay and AWS

Red Hat 3scale experienced a major incident on October 20, 2025 affecting OpenShift Cluster Manager, lasting 13h 43m. The incident has been resolved; the full update timeline is below.

Started: Oct 20, 2025, 08:45 AM UTC
Resolved: Oct 20, 2025, 10:29 PM UTC
Duration: 13h 43m
Detected by Pingoru: Oct 20, 2025, 08:45 AM UTC

Affected components

OpenShift Cluster Manager

Update timeline

monitoring Oct 20, 2025, 08:45 AM UTC

Due to an ongoing Quay and AWS incident ROSA and OSD clusters may face degradations of on-cluster services as well issues during installation. Red Hat is actively monitoring the situation and will provide updates as we become aware of them.
monitoring Oct 20, 2025, 09:28 AM UTC

The impact of the incident is currently limited to the AWS us-east-1 region.
monitoring Oct 20, 2025, 11:18 AM UTC

The AWS incident has been updated from Degraded to Impacting. We are still seeing impact to the ROSA and OSD services in us-east-1 region, mostly related to EC2 instance launches. We are continuing to monitor the incident. Currently no customer actions are required. We will update the incident within 1 hour.
monitoring Oct 20, 2025, 12:05 PM UTC

According to the newest update from AWS there are ongoing issues with VM launches. We're observing launch errors in the us-east-1 region which affects ROSA and OSD products. The status hasn't changed, no action is required from the customers.
monitoring Oct 20, 2025, 01:03 PM UTC

According to the newest AWS update mitigations of the EC2 instance launch issue are ongoing. The incident is still ongoing. No customer action is required. Red Hat is monitoring the ongoing incident.
monitoring Oct 20, 2025, 06:59 PM UTC

Per AWS's most recent update at 11:22 AM PDT, multiple AWS services affecting compute and networking remain down or degraded. This may impact cluster operations, including cluster creation, image pulls, upgrades, and more. Please see https://health.aws.amazon.com/health/status for detailed and direct updates of these underlying services.
monitoring Oct 20, 2025, 08:36 PM UTC

Per AWS's most recent update at 1:03 PM PDT, multiple AWS services affecting compute and networking are continuing to see an improvement. However, there may still be impact to cluster operations, including cluster creation, image pulls, upgrades, and more. Please see https://health.aws.amazon.com/health/status for detailed and direct updates of these underlying services.
resolved Oct 20, 2025, 10:29 PM UTC

Per AWS's most recent update at 2:48 PM PDT, EC2 instance creation is no longer being throttled, and has returned to normal pre-incident levels. We have seen recovery of clusters affected by this outage, and are resolving this incident. If you are experiencing any issues, please reach out to Support.