Affected components
Update timeline
- investigating Mar 11, 2026, 03:38 PM UTC
We are currently investigating this issue.
- identified Mar 11, 2026, 03:42 PM UTC
The issue has been identified and a fix is being implemented.
- identified Mar 11, 2026, 03:55 PM UTC
We are continuing to work on a fix for this issue.
- identified Mar 11, 2026, 04:20 PM UTC
Dashboards are in recovering phase. We are continuing to work on a fix for pipelines issue.
- identified Mar 11, 2026, 05:18 PM UTC
Pipeline executions are going fine,there is a delay to view it on the UI.
- identified Mar 11, 2026, 06:06 PM UTC
Currently all the executions are going on track and will complete. The UI is showing a delayed status. We are currently expediting the UI recovery.
- monitoring Mar 11, 2026, 06:15 PM UTC
A fix has been implemented and we are monitoring the results.
- monitoring Mar 11, 2026, 06:33 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 11, 2026, 06:45 PM UTC
This incident has been resolved.
- postmortem Mar 17, 2026, 11:40 PM UTC
### **Summary** On March 11, 2026, customers experienced pipeline failures and degraded UI performance\(incorrect status of states\) and CCM Dashboards were not accessible to the affected customers in the Prod2 environment. The issue was caused by a degradation in an internal shared infrastructure component used for coordination across services. The incident began around **7:10 AM PST** and was fully mitigated by approximately **10:12 AM PST**. During this period, pipeline execution throughput was significantly impacted for affected customers. ### **Root Cause** The issue was caused by resource saturation in a shared infrastructure component used for distributed coordination, which led to increased latency and failures in service-to-service communication. As a result, pipeline execution services were unable to process workloads efficiently, leading to a buildup of queued tasks and reduced system throughput. ### **Impact** Customers experienced the following: * Pipeline executions failing or not progressing * Increased pipeline execution times * UI delays due to processing backlogs The impact was limited to specific production environments and no data loss occurred. ### **Mitigation** **Immediate** * Redirected services to a higher-capacity infrastructure instance to restore normal processing * Cleared accumulated processing backlogs to recover system throughput * Scaled supporting services to stabilize performance **Permanent** * Improved monitoring and alerting for early detection of resource saturation * Implemented capacity and scaling improvements to handle higher load scenarios * Initiated architectural improvements to reduce reliance on shared coordination components ### **Action Items** To prevent such issues from happening again we are taking several steps: * Enhance alerting to detect early signs of infrastructure saturation * Review and optimize system behavior under high concurrency scenarios * Continue investigation into the triggering conditions and incorporate findings into long-term improvements
Looking to track Harness downtime and outages?
Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.
- Real-time alerts when Harness reports an incident
- Email, Slack, Discord, Microsoft Teams, and webhook notifications
- Track Harness alongside 5,000+ providers in one dashboard
- Component-level filtering
- Notification groups + maintenance calendar
5 free monitors · No credit card required