Harness incident

Legacy Run Test step is failing intermittently for all customers in Prod2

Minor Resolved View vendor source →
Started
Apr 09, 2026, 05:30 AM UTC
Resolved
Apr 09, 2026, 10:48 AM UTC
Duration
5h 18m
Detected by Pingoru
Apr 09, 2026, 05:30 AM UTC

Affected components

Continuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud Builds

Update timeline

  1. investigating Apr 09, 2026, 08:34 AM UTC

    Some of the legacy run test step connectivity to test intel service is failing intermittently. We are currently investigating the issue here.

  2. identified Apr 09, 2026, 09:38 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Apr 09, 2026, 10:38 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Apr 09, 2026, 10:48 AM UTC

    This incident has been resolved.

  5. postmortem Apr 21, 2026, 01:18 PM UTC

    ## **Summary** On April 8, 2026, customers in certain production environments experienced degraded performance and intermittent failures while accessing the platform. This impacted login functionality and execution of new and existing tasks. ## **Root Cause** A spike in internal task processing caused excessive load on the service, leading to resource exhaustion and degraded performance across multiple service instances. ## **Impact** Customers in affected environments experienced: * Slowness and failures during login * Inability to start new tasks in some cases * Failures in ongoing executions ## **Remediation** ‌ **Immediate:** Stabilized the system by resetting affected components and restoring service capacity, which allowed the platform to recover. **Permanent:** Introduced safeguards to limit resource-intensive operations and prevent unbounded processing under high load conditions. ## **Action Items** To prevent such issues from happening again, Harness will * Add limits to high-volume internal processing paths * Audit and enforce safeguards across similar workflows * Improve system resilience under burst load scenarios * Enhance monitoring to detect abnormal load patterns earlier

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Harness reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Harness alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Harness for free

5 free monitors · No credit card required