Red Hat Outage History

Red Hat had 60 outages in the last 2 years totaling 219h 23m of downtime — averaging 2.5 incidents per month.

There were 60 Red Hat outages since August 19, 2025 totaling 219h 23m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.redhat.com

Major October 20, 2025

Cascading failures for ROSA and OSD services that depend on Quay and AWS

Detected by Pingoru: Oct 20, 2025, 08:45 AM UTC
Resolved: Oct 20, 2025, 10:29 PM UTC
Duration: 13h 43m

Affected: OpenShift Cluster Manager

Timeline · 8 updates

monitoring Oct 20, 2025, 08:45 AM UTC

Due to an ongoing Quay and AWS incident ROSA and OSD clusters may face degradations of on-cluster services as well issues during installation. Red Hat is actively monitoring the situation and will provide updates as we become aware of them.
monitoring Oct 20, 2025, 09:28 AM UTC

The impact of the incident is currently limited to the AWS us-east-1 region.
monitoring Oct 20, 2025, 11:18 AM UTC

The AWS incident has been updated from Degraded to Impacting. We are still seeing impact to the ROSA and OSD services in us-east-1 region, mostly related to EC2 instance launches. We are continuing to monitor the incident. Currently no customer actions are required. We will update the incident within 1 hour.
monitoring Oct 20, 2025, 12:05 PM UTC

According to the newest update from AWS there are ongoing issues with VM launches. We're observing launch errors in the us-east-1 region which affects ROSA and OSD products. The status hasn't changed, no action is required from the customers.
monitoring Oct 20, 2025, 01:03 PM UTC

According to the newest AWS update mitigations of the EC2 instance launch issue are ongoing. The incident is still ongoing. No customer action is required. Red Hat is monitoring the ongoing incident.
monitoring Oct 20, 2025, 06:59 PM UTC

Per AWS's most recent update at 11:22 AM PDT, multiple AWS services affecting compute and networking remain down or degraded. This may impact cluster operations, including cluster creation, image pulls, upgrades, and more. Please see https://health.aws.amazon.com/health/status for detailed and direct updates of these underlying services.
monitoring Oct 20, 2025, 08:36 PM UTC

Per AWS's most recent update at 1:03 PM PDT, multiple AWS services affecting compute and networking are continuing to see an improvement. However, there may still be impact to cluster operations, including cluster creation, image pulls, upgrades, and more. Please see https://health.aws.amazon.com/health/status for detailed and direct updates of these underlying services.
resolved Oct 20, 2025, 10:29 PM UTC

Per AWS's most recent update at 2:48 PM PDT, EC2 instance creation is no longer being throttled, and has returned to normal pre-incident levels. We have seen recovery of clusters affected by this outage, and are resolving this incident. If you are experiencing any issues, please reach out to Support.

Read the full incident report →

Major October 20, 2025

Quay.io reporting errors - Container Registry is currently in read-only mode

Detected by Pingoru: Oct 20, 2025, 08:10 AM UTC
Resolved: Oct 20, 2025, 11:05 PM UTC
Duration: 14h 54m

Affected: registry.redhat.ioRegistryFrontend

Timeline · 4 updates

investigating Oct 20, 2025, 08:10 AM UTC

Quay.io is currently reporting errors - service is currently in read only mode
investigating Oct 20, 2025, 12:02 PM UTC

The team is currently monitoring the situation to make an informed decision on re-enabling write access to quay.io.
identified Oct 20, 2025, 10:52 PM UTC

Per AWS's most recent update, the majority of services have been restored. We have begun the process of re-enabling Pushes.
resolved Oct 20, 2025, 11:05 PM UTC

All services have been restored.

Read the full incident report →

Major October 2, 2025

Connect.redhat.com - Outage

Detected by Pingoru: Oct 02, 2025, 04:57 PM UTC
Resolved: Oct 02, 2025, 04:59 PM UTC
Duration: 1m

Affected: Partner Container Certification WorkflowPartner Acceleration DeskPartner Account ManagementPartner PortalPartner Subscriptions

Timeline · 2 updates

identified Oct 02, 2025, 04:57 PM UTC

connect.redhat.com is currently unavailable due to an underlying service issue. Our team is actively working to resolve the problem. We appreciate your patience and will provide updates as they become available.
resolved Oct 02, 2025, 04:59 PM UTC

This incident has been resolved.

Read the full incident report →

Major September 25, 2025

Quay.io UI Reporting Internal Error

Detected by Pingoru: Sep 25, 2025, 01:20 PM UTC
Resolved: Sep 25, 2025, 08:10 PM UTC
Duration: 6h 49m

Affected: FrontendSecurity Scanning

Timeline · 4 updates

identified Sep 25, 2025, 01:20 PM UTC

The quay.io web UI is currently reporting errors. Image push/pull and builds continue to work. We have identified the issue and are rolling out a fix.
identified Sep 25, 2025, 01:27 PM UTC

We are continuing to work on a fix for this issue.
monitoring Sep 25, 2025, 01:38 PM UTC

A fix has been implemented however we will need to keep Security Scanning disabled for the time being. We will re-enable it as soon as possible.
resolved Sep 25, 2025, 08:10 PM UTC

The incident has been resolved and Security Scanning is now fully enabled.

Read the full incident report →

Minor September 24, 2025

Red Hat 3scale API Management SaaS Admin and Developer Portal UIs is currently experiencing a service disruption

Detected by Pingoru: Sep 24, 2025, 01:36 PM UTC
Resolved: Sep 24, 2025, 02:01 PM UTC
Duration: 25m

Timeline · 2 updates

investigating Sep 24, 2025, 01:36 PM UTC

The Red Hat 3scale Site Reliability Engineering team is conducting a root cause analysis and taking corrective actions. Updates and additional information will be provided as soon as they become available, and efforts will continue until full resolution is achieved.
resolved Sep 24, 2025, 02:01 PM UTC

This incident has been resolved.

Read the full incident report →

Minor September 19, 2025

Red Hat 3scale API Management SaaS APIs are currently experiencing a service disruption

Detected by Pingoru: Sep 19, 2025, 09:06 AM UTC
Resolved: Sep 19, 2025, 10:10 AM UTC
Duration: 1h 3m

Affected: API Manager

Timeline · 3 updates

investigating Sep 19, 2025, 09:06 AM UTC

The Red Hat 3scale Site Reliability Engineering team is conducting a root cause analysis and taking corrective actions. Updates and additional information will be provided as soon as they become available, and efforts will continue until full resolution is achieved.
monitoring Sep 19, 2025, 09:20 AM UTC

Sustained period of increased service latency.
resolved Sep 19, 2025, 10:10 AM UTC

This incident has been resolved.

Read the full incident report →

Major September 9, 2025

Hybrid Cloud Console partial frontend outage

Detected by Pingoru: Sep 09, 2025, 07:40 PM UTC
Resolved: Sep 09, 2025, 09:20 PM UTC
Duration: 1h 39m

Affected: console.redhat.com

Timeline · 2 updates

identified Sep 09, 2025, 07:40 PM UTC

We are aware of a few Hybrid Cloud Console applications which are unexpectedly serving 403 errors. We've identified they are not correctly requesting necessary OAuth scopes and investigating a fix.
resolved Sep 09, 2025, 09:20 PM UTC

We have fixed the issues contributing to incorrect oauth scopes being requested for some applications. The Hybrid Cloud Console is back to normal operation.

Read the full incident report →

Critical September 2, 2025

Cloud-Connector Outage

Detected by Pingoru: Sep 02, 2025, 01:36 PM UTC
Resolved: Sep 02, 2025, 03:32 PM UTC
Duration: 1h 55m

Affected: Red Hat Lightspeed - AdvisorRed Hat Lightspeed - Remediations

Timeline · 3 updates

investigating Sep 02, 2025, 01:36 PM UTC

Cloud-Connector is experiencing a outage due to DNS issues. This may impact the ability perform remote Playbook runs.
investigating Sep 02, 2025, 02:51 PM UTC

Remediations & Advisor/Tasks services impacted. New RHC connections may fail, affecting the ability to run playbooks and tasks.
resolved Sep 02, 2025, 03:32 PM UTC

Akamai & HarperDB have addressed the configuration issues that led to this outage. End to end tests confirm this has been resolved.

Read the full incident report →

Critical August 29, 2025

OCM API degraded

Detected by Pingoru: Aug 29, 2025, 12:26 PM UTC
Resolved: Aug 29, 2025, 02:32 PM UTC
Duration: 2h 5m

Affected: api.openshift.com

Timeline · 3 updates

investigating Aug 29, 2025, 12:26 PM UTC

OCM upgrade API returning 500 responses.
monitoring Aug 29, 2025, 02:19 PM UTC

Latency and error rate have decreased. Transitioning to monitoring for sustained stability.
resolved Aug 29, 2025, 02:32 PM UTC

Sustained stability observed.

Read the full incident report →

Minor August 19, 2025

Red Hat 3scale API Management SaaS AutoSSL is currently experiencing a service disruption

Detected by Pingoru: Aug 19, 2025, 04:34 AM UTC
Resolved: Aug 26, 2025, 01:17 PM UTC
Duration: 7d 8h

Read the full incident report →