Harness incident

Pipeline are running slow in Prod3

Started: Feb 17, 2026, 05:27 PM UTC
Resolved: Feb 17, 2026, 10:59 PM UTC
Duration: 5h 32m
Detected by Pingoru: Feb 17, 2026, 05:27 PM UTC

Affected components

Continuous Delivery - Next Generation (CDNG)Continuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsSecurity Testing Orchestration (STO)Service Reliability Management (SRM)Chaos EngineeringInternal Developer Portal (IDP)Infrastructure as Code Management (IaCM)Software Supply Chain Assurance (SSCA)

Update timeline

investigating Feb 17, 2026, 05:27 PM UTC

We are currently investigating this issue.
identified Feb 17, 2026, 05:28 PM UTC

We are actively working to mitigate this
monitoring Feb 17, 2026, 06:11 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 17, 2026, 10:59 PM UTC

This incident has been resolved.
postmortem Mar 02, 2026, 08:42 PM UTC

**Summary** On February 17, 2026, we had a traffic spike in one of the services in Prod3, which impacted the Pipeline Service’s capacity. We remediated this by addressing the source of the spike in workload and performing tuning of our backend systems. **Root Cause** Starting around 7:25 A.M. PST, our databases became overwhelmed with an increased rate of writes, causing resource pressure. The write latency spiked, causing our upstream systems to experience timeouts and errors. **Customer Impact** During the window of the impact * Pipeline executions ran significantly slower or stalled, with initialization steps delayed. * Slowness while performing CRUD operations on pipelines. **Resolution** We identified and disabled a high-frequency batch write workload that was contributing significantly to the write pressure. By switching that component to a lower-write alternative flow, full system recovery was confirmed at ~10:05 AM PST. **Prevention and Improvements** To prevent recurrence and enable faster identification of such issues, we are taking several measures: * Automate the audit and proactively optimize resource-intensive queries. Optimize with better indexes or query scope limits to prevent working set overflow. * Fine-tune workloads to increase headroom to handle spikes. * Add proactive alerts for sustained traffic rates and resource utilization approaching the high watermark. * Add capacity to our backend systems.

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Harness reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Harness alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Harness for free

5 free monitors · No credit card required