Harness incident

Pipelines and dashboards are impacted in Prod2

Started: Mar 11, 2026, 04:20 PM UTC
Resolved: Mar 11, 2026, 06:45 PM UTC
Duration: 2h 24m
Detected by Pingoru: Mar 11, 2026, 04:20 PM UTC

Affected components

Continuous Delivery - Next Generation (CDNG)Cloud Cost Management (CCM)Continuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsCustom Dashboards

Update timeline

investigating Mar 11, 2026, 03:38 PM UTC

We are currently investigating this issue.
identified Mar 11, 2026, 03:42 PM UTC

The issue has been identified and a fix is being implemented.
identified Mar 11, 2026, 03:55 PM UTC

We are continuing to work on a fix for this issue.
identified Mar 11, 2026, 04:20 PM UTC

Dashboards are in recovering phase. We are continuing to work on a fix for pipelines issue.
identified Mar 11, 2026, 05:18 PM UTC

Pipeline executions are going fine,there is a delay to view it on the UI.
identified Mar 11, 2026, 06:06 PM UTC

Currently all the executions are going on track and will complete. The UI is showing a delayed status. We are currently expediting the UI recovery.
monitoring Mar 11, 2026, 06:15 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Mar 11, 2026, 06:33 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 11, 2026, 06:45 PM UTC

This incident has been resolved.
postmortem Mar 17, 2026, 11:40 PM UTC

### **Summary** On March 11, 2026, customers experienced pipeline failures and degraded UI performance\(incorrect status of states\) and CCM Dashboards were not accessible to the affected customers in the Prod2 environment. The issue was caused by a degradation in an internal shared infrastructure component used for coordination across services. The incident began around **7:10 AM PST** and was fully mitigated by approximately **10:12 AM PST**. During this period, pipeline execution throughput was significantly impacted for affected customers. ### **Root Cause** The issue was caused by resource saturation in a shared infrastructure component used for distributed coordination, which led to increased latency and failures in service-to-service communication. As a result, pipeline execution services were unable to process workloads efficiently, leading to a buildup of queued tasks and reduced system throughput. ### **Impact** Customers experienced the following: * Pipeline executions failing or not progressing * Increased pipeline execution times * UI delays due to processing backlogs The impact was limited to specific production environments and no data loss occurred. ### **Mitigation** **Immediate** * Redirected services to a higher-capacity infrastructure instance to restore normal processing * Cleared accumulated processing backlogs to recover system throughput * Scaled supporting services to stabilize performance **Permanent** * Improved monitoring and alerting for early detection of resource saturation * Implemented capacity and scaling improvements to handle higher load scenarios * Initiated architectural improvements to reduce reliance on shared coordination components ### **Action Items** To prevent such issues from happening again we are taking several steps: * Enhance alerting to detect early signs of infrastructure saturation * Review and optimize system behavior under high concurrency scenarios * Continue investigation into the triggering conditions and incorporate findings into long-term improvements

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Harness reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Harness alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Harness for free

5 free monitors · No credit card required