Harness incident

Prod1/Prod2 pipelines and logins are degraded. Some delegates are disconnected

Major Resolved View vendor source →
Started
Feb 26, 2026, 05:56 PM UTC
Resolved
Feb 26, 2026, 06:29 PM UTC
Duration
33m
Detected by Pingoru
Feb 26, 2026, 05:56 PM UTC

Affected components

Continuous Delivery - Next Generation (CDNG)Continuous Delivery - Next Generation (CDNG)

Update timeline

  1. investigating Feb 26, 2026, 05:56 PM UTC

    We are currently investigating this issue.

  2. identified Feb 26, 2026, 06:05 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Feb 26, 2026, 06:14 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Feb 26, 2026, 06:29 PM UTC

    This incident has been resolved.

  5. postmortem Mar 02, 2026, 04:26 PM UTC

    ## Summary On **February 26, 2026**, multiple customers experienced disruptions accessing Harness on Prod1 and Prod2. A transient network connectivity issue caused disruption to our backend systems , leading to platform unresponsiveness. Service was restored within approximately one hour. ## Impact * Customers on Prod2 were unable to log in or access the Harness platform. * Prod1 experienced login disruptions due to a cross-environment dependency on Prod2. * Delegates disconnected; Kubernetes-based delegates reconnected automatically, while non-Kubernetes delegates required a manual restart. ## Root Cause A transient network connectivity disruption caused connection timeouts across the platform. The exact infrastructure-side trigger of the initial connectivity disruption is still under investigation. ## Remediation * **Immediate:** Affected services were manually restarted, clearing stuck connections and restoring platform availability. * **Short-term:** Autoscaling limits were adjusted to better handle sudden reconnection load. * **Ongoing:** Investigation into timeout configuration and application resilience improvements is in progress. ## Action Items To prevent such issues from happening again 1. Review and update the timeouts settings to fail fast and limit thread blocking during connectivity issues. 2. Improve application resilience — enhance circuit breakers to prevent connectivity issues and retries

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Harness reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Harness alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Harness for free

5 free monitors · No credit card required