Harness incident

Intermittent slowness while running pipelines

Notice Resolved View vendor source →
Started
Apr 27, 2026, 10:27 PM UTC
Resolved
Apr 27, 2026, 08:00 PM UTC
Duration
Detected by Pingoru
Apr 27, 2026, 10:27 PM UTC

Update timeline

  1. resolved Apr 27, 2026, 10:27 PM UTC

    We were seeing slowness while executing pipelines

  2. postmortem Apr 29, 2026, 07:53 PM UTC

    ## **Summary** On April 27, 2026, customers running pipelines in the Prod3 environment experienced intermittent slowness in pipeline execution and delays in execution status updates in the UI. It was caused by a unexpected spike causing contention on a backend database supporting pipeline orchestration. The issue was mitigated and fully resolved. ## **Impact** **Incident window:** April 27, 2026, 1:00 PM – 3:12 PM PDT * Pipeline executions ran slower than normal; some executions took longer than expected to complete. For pipelines with stricter timeouts, there could be failures. * No widespread pipeline failures were observed * Execution view in the UI lagged behind real-time pipeline progress There was no data loss. The majority of pipelines continued to execute successfully, with the primary impact being increased latency and delayed UI updates. ## **Root Cause** Pipeline orchestration relies on a backend database to track execution state and power the execution view in the UI. During the incident, we had a spike of load, leading to increased query latency across the orchestration layer.This resulted in a backlog, causing UI updates to lag behind actual pipeline execution until the system was scaled. ## **Remediation** **Immediate Mitigation** * Scaled up the affected database instance to increase CPU capacity * Reduced query latency and eliminated lock contention * Cleared the execution-view update backlog within ~30 minutes These actions restored normal pipeline performance and UI responsiveness. ## **Action Items** To prevent such issues from happening again. * **Capacity Improvements:**Updated Prod3 capacity baseline to prevent similar resource constraints * **Proactive Detection:** Enhancing monitoring and alerting for backend resource utilization, lock contention, and critical query latency

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Harness reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Harness alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Harness for free

5 free monitors · No credit card required