Harness incident

Prod1 and Prod2 is facing login issues

Started: Apr 09, 2026, 01:05 AM UTC
Resolved: Apr 09, 2026, 02:42 AM UTC
Duration: 1h 36m
Detected by Pingoru: Apr 09, 2026, 01:05 AM UTC

Affected components

Continuous Delivery - Next Generation (CDNG)Continuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsFeature Flags (FF)PlatformFME

Update timeline

investigating Apr 09, 2026, 01:43 AM UTC

We are currently investigating this issue.
identified Apr 09, 2026, 02:02 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 09, 2026, 02:33 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 09, 2026, 02:42 AM UTC

This incident has been resolved.
postmortem Apr 16, 2026, 09:22 PM UTC

## **Summary** On April 8, 2026, customers in Prod1 and Prod2 experienced degraded performance when logging into the Harness platform. Additionally, in Prod2, customers were unable to start new pipeline executions and some running pipelines failed. The issue lasted approximately 1 hour and 35 minutes. ## **Root Cause** The issue was caused by a sudden surge of task reassignment requests triggered after customer delegate restarts. This resulted in a high volume of backend processing requests that exceeded expected limits, leading to elevated resource utilization and degraded performance of the Harness Manager service. ## **Impact** * Customers in Prod1 and Prod2 experienced login failures and degraded user operations. * Customers in Prod2 were unable to start new pipeline executions, and some ongoing executions failed. * All customers in the affected clusters experienced service degradation during the incident window. ## **Remediation** **Immediate:** * Restarted affected services and stabilized system performance, restoring login and pipeline functionality. **Permanent:** * Introduced safeguards to limit backend processing for large task reassignment scenarios. * Identifying and applying limits to similar high-volume operations to prevent resource exhaustion. ## **Action Items** To prevent from such issues from happening again * Implement query limits for high-volume task processing scenarios. * Audit and enforce limits across similar backend operations so that we can be resilient. * Enhance monitoring and alerting for abnormal spikes in task reassignment and resource utilization.

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Harness reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Harness alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Harness for free

5 free monitors · No credit card required