Harness experienced a major incident on April 9, 2026 affecting Continuous Delivery - Next Generation (CDNG) and Continuous Integration Enterprise(CIE) - Mac Cloud Builds and 1 more component, lasting 1h 36m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 09, 2026, 01:43 AM UTC
We are currently investigating this issue.
- identified Apr 09, 2026, 02:02 AM UTC
The issue has been identified and a fix is being implemented.
- monitoring Apr 09, 2026, 02:33 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Apr 09, 2026, 02:42 AM UTC
This incident has been resolved.
- postmortem Apr 16, 2026, 09:22 PM UTC
## **Summary** On April 8, 2026, customers in Prod1 and Prod2 experienced degraded performance when logging into the Harness platform. Additionally, in Prod2, customers were unable to start new pipeline executions and some running pipelines failed. The issue lasted approximately 1 hour and 35 minutes. ## **Root Cause** The issue was caused by a sudden surge of task reassignment requests triggered after customer delegate restarts. This resulted in a high volume of backend processing requests that exceeded expected limits, leading to elevated resource utilization and degraded performance of the Harness Manager service. ## **Impact** * Customers in Prod1 and Prod2 experienced login failures and degraded user operations. * Customers in Prod2 were unable to start new pipeline executions, and some ongoing executions failed. * All customers in the affected clusters experienced service degradation during the incident window. ## **Remediation** **Immediate:** * Restarted affected services and stabilized system performance, restoring login and pipeline functionality. **Permanent:** * Introduced safeguards to limit backend processing for large task reassignment scenarios. * Identifying and applying limits to similar high-volume operations to prevent resource exhaustion. ## **Action Items** To prevent from such issues from happening again * Implement query limits for high-volume task processing scenarios. * Audit and enforce limits across similar backend operations so that we can be resilient. * Enhance monitoring and alerting for abnormal spikes in task reassignment and resource utilization.