Harness incident

Legacy Run Test step is failing intermittently for all customers in Prod2

Harness experienced a minor incident on April 9, 2026 affecting Continuous Integration Enterprise(CIE) - Self Hosted Runners and Continuous Integration Enterprise(CIE) - Mac Cloud Builds and 1 more component, lasting 5h 18m. The incident has been resolved; the full update timeline is below.

Started: Apr 09, 2026, 05:30 AM UTC
Resolved: Apr 09, 2026, 10:48 AM UTC
Duration: 5h 18m
Detected by Pingoru: Apr 09, 2026, 05:30 AM UTC

Affected components

Continuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud Builds

Update timeline

investigating Apr 09, 2026, 08:34 AM UTC

Some of the legacy run test step connectivity to test intel service is failing intermittently. We are currently investigating the issue here.
identified Apr 09, 2026, 09:38 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 09, 2026, 10:38 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 09, 2026, 10:48 AM UTC

This incident has been resolved.
postmortem Apr 21, 2026, 01:18 PM UTC

## **Summary** On April 8, 2026, customers in certain production environments experienced degraded performance and intermittent failures while accessing the platform. This impacted login functionality and execution of new and existing tasks. ## **Root Cause** A spike in internal task processing caused excessive load on the service, leading to resource exhaustion and degraded performance across multiple service instances. ## **Impact** Customers in affected environments experienced: * Slowness and failures during login * Inability to start new tasks in some cases * Failures in ongoing executions ## **Remediation** ‌ **Immediate:** Stabilized the system by resetting affected components and restoring service capacity, which allowed the platform to recover. **Permanent:** Introduced safeguards to limit resource-intensive operations and prevent unbounded processing under high load conditions. ## **Action Items** To prevent such issues from happening again, Harness will * Add limits to high-volume internal processing paths * Audit and enforce safeguards across similar workflows * Improve system resilience under burst load scenarios * Enhance monitoring to detect abnormal load patterns earlier