Harness Outage History

There were 53 Harness outages since February 3, 2026 totaling 92h 57m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.harness.io

Minor May 2, 2026

Scheduled change request delayed for a subset of flags

Detected by Pingoru: May 02, 2026, 12:01 AM UTC
Resolved: Apr 28, 2026, 05:21 PM UTC
Duration: —

Timeline · 1 update

resolved May 02, 2026, 12:01 AM UTC

Between April 28 and May 1, 2026, a subset of scheduled change requests failed to execute at their scheduled times. The issue was introduced by a production deployment on April 28 and fixed with a new deployment on May 1. All affected scheduled jobs that had not been manually withdrawn by customers have since been executed successfully.

Read the full incident report →

Minor May 1, 2026

Prod3 experiences slowness in pipelines

Detected by Pingoru: May 01, 2026, 04:28 PM UTC
Resolved: May 01, 2026, 08:10 PM UTC
Duration: 3h 41m

Affected: Continuous Delivery - Next Generation (CDNG)Continuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsFME

Timeline · 6 updates

investigating May 01, 2026, 04:28 PM UTC

We are currently investigating this issue.
monitoring May 01, 2026, 04:56 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring May 01, 2026, 05:14 PM UTC

We are continuing to monitor for any further issues.
monitoring May 01, 2026, 05:15 PM UTC

We are continuing to monitor for any further issues.
monitoring May 01, 2026, 07:58 PM UTC

We are largely mitigated and most pipelines are running normally. We are monitoring all parameters to make sure there are no issues before closing it.
resolved May 01, 2026, 08:10 PM UTC

This incident has been resolved.

Read the full incident report →

Minor May 1, 2026

Intermittent slowness during pipeline executions (Prod1, Prod2)

Detected by Pingoru: May 01, 2026, 03:02 PM UTC
Resolved: May 01, 2026, 08:10 PM UTC
Duration: 5h 8m

Timeline · 4 updates

investigating May 01, 2026, 03:02 PM UTC

We are currently investigating this issue.
monitoring May 01, 2026, 03:37 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring May 01, 2026, 07:58 PM UTC

We are largely mitigated and most pipelines are running normally. We are monitoring all parameters to make sure there are no issues before closing it.
resolved May 01, 2026, 08:10 PM UTC

This incident has been resolved.

Read the full incident report →

Minor April 30, 2026

FF (Classic) some API's have elevated latencies in prod2

Detected by Pingoru: Apr 30, 2026, 11:09 PM UTC
Resolved: Apr 30, 2026, 11:43 PM UTC
Duration: 33m

Affected: Feature Flags (FF)

Timeline · 4 updates

investigating Apr 30, 2026, 11:09 PM UTC

We are currently investigating this issue.
identified Apr 30, 2026, 11:19 PM UTC

Issue has been identified and fix is underway
monitoring Apr 30, 2026, 11:29 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 30, 2026, 11:43 PM UTC

This incident has been resolved.

Read the full incident report →

Minor April 30, 2026

Platform is experiencing degraded performance for some organizations.

Detected by Pingoru: Apr 30, 2026, 04:25 PM UTC
Resolved: Apr 30, 2026, 05:50 PM UTC
Duration: 1h 25m

Affected: Continuous Delivery (CD) - FirstGen - EOSContinuous Delivery - Next Generation (CDNG)Continuous Delivery - Next Generation (CDNG)Cloud Cost Management (CCM)Cloud Cost Management (CCM)Continuous Error Tracking (CET)Continuous Error Tracking (CET)Chaos EngineeringContinuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsCustom DashboardsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsFeature Flags (FF)Custom DashboardsFeature Flags (FF)Security Testing Orchestration (STO)Security Testing Orchestration (STO)Service Reliability Management (SRM)Service Reliability Management (SRM)Chaos EngineeringInternal Developer Portal (IDP)Internal Developer Portal (IDP)Infrastructure as Code Management (IaCM)Infrastructure as Code Management (IaCM)Software Supply Chain Assurance (SSCA)Software Supply Chain Assurance (SSCA)Software Engineering Insights (SEI)Software Engineering Insights (SEI)Code RepositoryArtifact RegistryPlatformFME

Timeline · 3 updates

investigating Apr 30, 2026, 04:25 PM UTC

We are currently investigating this issue.
identified Apr 30, 2026, 05:27 PM UTC

Issue has been identified and mitigated
resolved Apr 30, 2026, 07:08 PM UTC

This incident has been resolved.

Read the full incident report →

Minor April 30, 2026

FME UI errors on creating/updating Flags for subset of customers who have Jira integration.

Detected by Pingoru: Apr 30, 2026, 03:57 PM UTC
Resolved: Apr 30, 2026, 04:14 PM UTC
Duration: 17m

Affected: FMEFMEFME

Timeline · 2 updates

investigating Apr 30, 2026, 03:57 PM UTC

We are currently investigating this issue.
resolved Apr 30, 2026, 04:14 PM UTC

This incident has been resolved.

Read the full incident report →

Notice April 28, 2026

Rule Base Segment supported SDKs timeout (FME Only)

Detected by Pingoru: Apr 28, 2026, 09:48 PM UTC
Resolved: Apr 28, 2026, 06:27 PM UTC
Duration: —

Timeline · 1 update

resolved Apr 28, 2026, 09:48 PM UTC

Requests made by SDKs with Rule Based Segments support could have gotten a null instead of an empty for responses with no Rule Based Segments, leading to an null pointer exception. Impact observed: 18:27 until 19:05 UTC. The impact for SDKs would have been a failure to initialize or process an update. Any change made to feature flags or RBS in an affected environment would re-generate any specific caches remaining.

Read the full incident report →

Major April 28, 2026

Testing Dev Status Page Degradation Testing

Detected by Pingoru: Apr 28, 2026, 03:28 PM UTC
Resolved: Apr 28, 2026, 07:03 PM UTC
Duration: 3h 34m

Affected: Cloud Cost Management (CCM)Infrastructure as Code Management (IaCM)

Timeline · 1 update

investigating Apr 28, 2026, 03:28 PM UTC

We are currently investigating this issue.

Read the full incident report →

Major April 28, 2026

Testing Dev Status Page Adjusting to test the Active incident length

Detected by Pingoru: Apr 28, 2026, 11:51 AM UTC
Resolved: Apr 28, 2026, 02:16 PM UTC
Duration: 2h 24m

Affected: Continuous Delivery (CD) - FirstGen - EOSContinuous Delivery - Next Generation (CDNG)Cloud Cost Management (CCM)Infrastructure as Code Management (IaCM)Service Reliability Management (SRM)Feature Flags (FF)

Timeline · 2 updates

investigating Apr 28, 2026, 11:51 AM UTC

We are currently investigating this issue.
monitoring Apr 28, 2026, 12:14 PM UTC

A fix has been implemented and we are monitoring the results.

Read the full incident report →

Notice April 27, 2026

Intermittent slowness while running pipelines

Detected by Pingoru: Apr 27, 2026, 10:27 PM UTC
Resolved: Apr 27, 2026, 08:00 PM UTC
Duration: —

Timeline · 2 updates

resolved Apr 27, 2026, 10:27 PM UTC

We were seeing slowness while executing pipelines
postmortem Apr 29, 2026, 07:53 PM UTC

## **Summary** On April 27, 2026, customers running pipelines in the Prod3 environment experienced intermittent slowness in pipeline execution and delays in execution status updates in the UI. It was caused by a unexpected spike causing contention on a backend database supporting pipeline orchestration. The issue was mitigated and fully resolved. ## **Impact** **Incident window:** April 27, 2026, 1:00 PM – 3:12 PM PDT * Pipeline executions ran slower than normal; some executions took longer than expected to complete. For pipelines with stricter timeouts, there could be failures. * No widespread pipeline failures were observed * Execution view in the UI lagged behind real-time pipeline progress There was no data loss. The majority of pipelines continued to execute successfully, with the primary impact being increased latency and delayed UI updates. ## **Root Cause** Pipeline orchestration relies on a backend database to track execution state and power the execution view in the UI. During the incident, we had a spike of load, leading to increased query latency across the orchestration layer.This resulted in a backlog, causing UI updates to lag behind actual pipeline execution until the system was scaled. ## **Remediation** **Immediate Mitigation** * Scaled up the affected database instance to increase CPU capacity * Reduced query latency and eliminated lock contention * Cleared the execution-view update backlog within ~30 minutes These actions restored normal pipeline performance and UI responsiveness. ## **Action Items** To prevent such issues from happening again. * **Capacity Improvements:**Updated Prod3 capacity baseline to prevent similar resource constraints * **Proactive Detection:** Enhancing monitoring and alerting for backend resource utilization, lock contention, and critical query latency

Read the full incident report →

Minor April 27, 2026

CI pipelines using cache intelligence are seeing degraded performance.

Detected by Pingoru: Apr 27, 2026, 04:05 PM UTC
Resolved: Apr 27, 2026, 04:24 PM UTC
Duration: 19m

Affected: Continuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud Builds

Timeline · 3 updates

investigating Apr 27, 2026, 04:05 PM UTC

We are currently investigating this issue.
identified Apr 27, 2026, 04:08 PM UTC

The issue has been identified and a fix is being implemented.
resolved Apr 27, 2026, 04:24 PM UTC

This incident has been resolved.

Read the full incident report →

Minor April 24, 2026

Degraded Performance — Feature Flags in PROD2

Detected by Pingoru: Apr 24, 2026, 04:06 PM UTC
Resolved: Apr 24, 2026, 06:59 PM UTC
Duration: 2h 52m

Affected: Feature Flags (FF)

Timeline · 4 updates

investigating Apr 24, 2026, 07:16 PM UTC

We are currently investigating this issue.
monitoring Apr 24, 2026, 07:29 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 24, 2026, 08:01 PM UTC

This incident has been resolved.
postmortem Apr 30, 2026, 07:16 PM UTC

### Summary On April 24, 2026, a large non-batched bulk DELETE operation on the prod-2 primary database triggered lock contention, causing Feature Flag API latency and hung queries across multiple customer SDKs. ### Impact 1. Slow SDK auth/init — SDKs took longer than expected to complete evaluations 2. Elevated latency across many FF APIs 3. Limited to Feature Flag module, prod-2 4. No Data loss ### Root Cause A background cleanup job executed a non-batched, single-transaction delete causing lock contention and API latency spikes **Mitigation** Immediately terminated the offending queries. ### Next Steps / Action Items To prevent such issues from happening again. we are working on 1. Enhanced alerting and observability on long running queries. 2. Permanently replace large single-transaction delete pattern with smaller batched deletes

Read the full incident report →

Major April 24, 2026

Testing Dev Status Page

Detected by Pingoru: Apr 24, 2026, 05:23 AM UTC
Resolved: Apr 24, 2026, 05:43 AM UTC
Duration: 19m

Affected: Feature Flags (FF)

Timeline · 2 updates

investigating Apr 24, 2026, 05:23 AM UTC

We are currently investigating this issue.
resolved Apr 24, 2026, 05:43 AM UTC

This incident has been resolved.

Read the full incident report →

Major April 19, 2026

IACM infrastructure pipelines using terraform are currently experiencing an outage

Detected by Pingoru: Apr 19, 2026, 02:09 PM UTC
Resolved: Apr 19, 2026, 03:24 PM UTC
Duration: 1h 15m

Affected: Infrastructure as Code Management (IaCM)Infrastructure as Code Management (IaCM)Infrastructure as Code Management (IaCM)Infrastructure as Code Management (IaCM)Infrastructure as Code Management (IaCM)

Timeline · 5 updates

investigating Apr 19, 2026, 02:09 PM UTC

We are currently investigating this issue.
identified Apr 19, 2026, 02:20 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 19, 2026, 03:17 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 19, 2026, 03:24 PM UTC

This incident has been resolved. Customers using a pinned version older than plugins/harness_terraform:0.214.0 should update to the latest version by following the https://developer.harness.io/docs/continuous-integration/use-ci/set-up-build-infrastructure/harness-ci/#specify-the-harness-ci-images-used-in-your-pipelines. If you are not pinning a specific version, no action is required — your pipelines are already using the updated image.
postmortem Apr 30, 2026, 07:04 PM UTC

## **Summary** On April 19, 2026, Terraform-based IaCM pipelines failed across production environments due to an issue with Terraform binary verification during runtime. The issue was caused by an expired OpenPGP signing key in a third-party library used to validate Terraform downloads. This resulted in failures when pipelines attempted to install Terraform dynamically. ## **Impact** * Terraform-based IaCM pipelines failed during execution * Failures occurred at runtime when attempting to download/verify Terraform binaries * **Customers Unaffected:** * OpenTofu-based pipelines * Pipelines using pre-installed or cached Terraform binaries Customers pinning plugin versions older than the fixed release continued to experience failures until upgraded. ## **Root Cause** The IaCM Terraform plugin relies on a third-party library \(HashiCorp’s `hc-install`\) to download and verify Terraform binaries. * The library contained a **hardcoded OpenPGP signing key** * This key **expired**, causing verification failures during Terraform installation * HashiCorp had not yet released an updated version with a renewed key ## **Remediation** ### **Immediate Mitigation** * Released **IaCM Terraform plugin v0.214.0** * Modified behavior to: * **Bypass the expired signature verification step** * Continue secure downloads over HTTPS ### **Resolution** * Rolled out the fix across **prod0–prod4** * Pipeline execution functionality was restored ## **Customer Actions Required** * Customers using pinned plugin versions **older than v0.214.0** must: * **Upgrade to v0.214.0 or later** * No action required for customers using default/latest plugin versions ## **Prevention & Next Steps** We are implementing the following improvements: * **Dependency Monitoring** * Proactive monitoring for third-party certificate/key expirations * **Upstream Coordination** * Track HashiCorp release for updated signing key * Re-enable signature verification once available * **Customer Communication** * Notify customers using older pinned versions * **Operational Improvements** * Enhance validation of external dependencies in runtime workflows

Read the full incident report →

Minor April 16, 2026

We are noticing CI hosted build failures

Detected by Pingoru: Apr 16, 2026, 07:15 AM UTC
Resolved: Apr 16, 2026, 12:02 PM UTC
Duration: 4h 46m

Affected: Continuous Integration Enterprise(CIE) - Linux Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud Builds

Timeline · 6 updates

investigating Apr 16, 2026, 10:06 AM UTC

We are currently investigating this issue.
identified Apr 16, 2026, 10:14 AM UTC

The issue has been identified and a fix is being implemented.
identified Apr 16, 2026, 11:08 AM UTC

We are continuing to work on a fix for this issue.
monitoring Apr 16, 2026, 11:41 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 16, 2026, 12:02 PM UTC

This incident has been resolved.
postmortem Apr 23, 2026, 03:34 PM UTC

On April 16, 2026, Hosted CI Linux pipelines experienced intermittent initialization failures due to an upstream outage affecting package repositories used during environment setup. ### **Impact** A subset of customers running Hosted CI pipelines encountered failures during the initialization phase, preventing jobs from starting successfully. * Affected: 24 accounts * Failed executions: 129 pipelines * Impact duration: ~5 hours 49 minutes ### **Root Cause** The issue was caused by a service disruption in an external package repository provider. During CI environment provisioning, dependency installation requests to this upstream service timed out, causing initialization failures. ### **Remediation** **Immediate Mitigation** We updated runner configurations to bypass dependency installation during initialization and rolled out updated environments across affected clusters. **Permanent Fix** We have improved resilience in our CI infrastructure by: * Using pre-configured environments with required dependencies pre-installed * Eliminating runtime dependency on external package repositories during initialization * Enhancing failure handling for external dependency timeouts ### **Action Items / Next Steps** * Continue improving isolation from external dependencies during environment startup * Strengthen monitoring and alerting for upstream service degradation * Optimize rollout speed for infrastructure changes to reduce mitigation time

Read the full incident report →

Minor April 15, 2026

Feature Flags unable to update

Detected by Pingoru: Apr 15, 2026, 11:55 PM UTC
Resolved: Apr 16, 2026, 04:15 AM UTC
Duration: 4h 20m

Affected: Feature Flags (FF)

Timeline · 5 updates

investigating Apr 15, 2026, 11:55 PM UTC

We are currently investigating this issue.
investigating Apr 16, 2026, 12:01 AM UTC

We are continuing to investigate this issue.
monitoring Apr 16, 2026, 04:04 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 16, 2026, 06:05 PM UTC

This incident has been resolved.
postmortem Apr 30, 2026, 05:34 PM UTC

## **Summary** On April 15, 2026, between approximately 23:21 UTC and 01:58 UTC, customers using Feature Flag in the prod2 environment experienced delays in feature flag updates. Feature flag changes made via UI or API were successfully processed but were **not immediately reflected**, causing stale flag values to be served. ## **Impact** * **Scope:** Customers on **prod2 environment only** * **Customer Impact:** * Feature flag updates were **delayed or appeared ineffective** * Applications continued serving **stale configurations** * **Other Environments:** No impact to prod0, prod1, or other regions ## **Root Cause** The issue was caused by replication lag in the read replica database used for serving feature flag reads. A **long-running read query** on the replica blocked replication updates from the primary database. This caused a delay in propagating recent feature flag changes to read queries ### **What triggered the issue** * A high-volume API usage pattern involving **large paginated queries on target data.** These queries became **resource-**intensive impacting the database. ## **Mitigation** ### **Immediate Actions** * Identified and **terminated long-running queries** on the replica * Replication resumed and flag updates began reflecting correctly ## **Prevention & Next Steps** We are continuing to strengthen reliability through: * We configured replica to **automatically cancel queries** that block replication beyond a threshold and tuned **query timeouts** for heavy read operations * Improving **query efficiency and pagination strategies** * Enhancing **monitoring and alerting for replication lag** * Evaluating **database upgrades and scaling improvements**

Read the full incident report →

Minor April 13, 2026

FME metrics impact calculations are not updating

Detected by Pingoru: Apr 13, 2026, 04:47 PM UTC
Resolved: Apr 13, 2026, 05:31 PM UTC
Duration: 43m

Affected: Data processing

Timeline · 3 updates

monitoring Apr 13, 2026, 04:47 PM UTC

Feature flag metrics impact calculations were not updating. This issue does not impact experiment calculation. Fix is being rolled out. There is no data loss.
resolved Apr 13, 2026, 05:31 PM UTC

After monitoring post fixing the issue, all systems are back to normal and processing metrics impact calculations regularly. We will provide an RCA soon.
postmortem Apr 20, 2026, 03:56 PM UTC

## **Summary** _April 13, 2026, FME metrics impact calculations experienced calculations not updated. The root cause was determined to be due to a bug introduced in the software upgrade / release process._ ## **Root Cause** _An internal library upgrade included in the release caused a runtime issue is a legacy execution pathway._ ## **Impact** Feature flag metrics impact calculations were not updating. This issue does not impact experiment calculation, and there is no data loss. ## **Mitigation** _To mitigate we immediately rolled back the update._ ## **Action Items** To prevent such issues from happening again, we are working on fixing gaps in monitoring and alerting for the metrics impact calculations flow.

Read the full incident report →

Minor April 9, 2026

Legacy Run Test step is failing intermittently for all customers in Prod2

Detected by Pingoru: Apr 09, 2026, 05:30 AM UTC
Resolved: Apr 09, 2026, 10:48 AM UTC
Duration: 5h 18m

Affected: Continuous Integration Enterprise(CIE) - Self Hosted RunnersContinuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud Builds

Timeline · 5 updates

investigating Apr 09, 2026, 08:34 AM UTC

Some of the legacy run test step connectivity to test intel service is failing intermittently. We are currently investigating the issue here.
identified Apr 09, 2026, 09:38 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 09, 2026, 10:38 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 09, 2026, 10:48 AM UTC

This incident has been resolved.
postmortem Apr 21, 2026, 01:18 PM UTC

## **Summary** On April 8, 2026, customers in certain production environments experienced degraded performance and intermittent failures while accessing the platform. This impacted login functionality and execution of new and existing tasks. ## **Root Cause** A spike in internal task processing caused excessive load on the service, leading to resource exhaustion and degraded performance across multiple service instances. ## **Impact** Customers in affected environments experienced: * Slowness and failures during login * Inability to start new tasks in some cases * Failures in ongoing executions ## **Remediation** ‌ **Immediate:** Stabilized the system by resetting affected components and restoring service capacity, which allowed the platform to recover. **Permanent:** Introduced safeguards to limit resource-intensive operations and prevent unbounded processing under high load conditions. ## **Action Items** To prevent such issues from happening again, Harness will * Add limits to high-volume internal processing paths * Audit and enforce safeguards across similar workflows * Improve system resilience under burst load scenarios * Enhance monitoring to detect abnormal load patterns earlier

Read the full incident report →

Major April 9, 2026

Prod1 and Prod2 is facing login issues

Detected by Pingoru: Apr 09, 2026, 01:05 AM UTC
Resolved: Apr 09, 2026, 02:42 AM UTC
Duration: 1h 36m

Affected: Continuous Delivery - Next Generation (CDNG)Continuous Integration Enterprise(CIE) - Mac Cloud BuildsContinuous Integration Enterprise(CIE) - Windows Cloud BuildsContinuous Integration Enterprise(CIE) - Linux Cloud BuildsFeature Flags (FF)PlatformFME

Timeline · 5 updates

investigating Apr 09, 2026, 01:43 AM UTC

We are currently investigating this issue.
identified Apr 09, 2026, 02:02 AM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 09, 2026, 02:33 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 09, 2026, 02:42 AM UTC

This incident has been resolved.
postmortem Apr 16, 2026, 09:22 PM UTC

## **Summary** On April 8, 2026, customers in Prod1 and Prod2 experienced degraded performance when logging into the Harness platform. Additionally, in Prod2, customers were unable to start new pipeline executions and some running pipelines failed. The issue lasted approximately 1 hour and 35 minutes. ## **Root Cause** The issue was caused by a sudden surge of task reassignment requests triggered after customer delegate restarts. This resulted in a high volume of backend processing requests that exceeded expected limits, leading to elevated resource utilization and degraded performance of the Harness Manager service. ## **Impact** * Customers in Prod1 and Prod2 experienced login failures and degraded user operations. * Customers in Prod2 were unable to start new pipeline executions, and some ongoing executions failed. * All customers in the affected clusters experienced service degradation during the incident window. ## **Remediation** **Immediate:** * Restarted affected services and stabilized system performance, restoring login and pipeline functionality. **Permanent:** * Introduced safeguards to limit backend processing for large task reassignment scenarios. * Identifying and applying limits to similar high-volume operations to prevent resource exhaustion. ## **Action Items** To prevent from such issues from happening again * Implement query limits for high-volume task processing scenarios. * Audit and enforce limits across similar backend operations so that we can be resilient. * Enhance monitoring and alerting for abnormal spikes in task reassignment and resource utilization.

Read the full incident report →

Minor April 8, 2026

Experiencing issues impacting pipeline executions in Prod1

Detected by Pingoru: Apr 08, 2026, 01:09 PM UTC
Resolved: Apr 08, 2026, 03:00 PM UTC
Duration: 1h 51m

Affected: Continuous Delivery - Next Generation (CDNG)Continuous Delivery - Next Generation (CDNG)

Timeline · 3 updates

investigating Apr 08, 2026, 01:09 PM UTC

We are currently investigating this issue.
monitoring Apr 08, 2026, 01:54 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Apr 08, 2026, 07:44 PM UTC

This incident has been resolved.

Read the full incident report →

Critical April 7, 2026

The IDP app is currently facing accessibility issues

Detected by Pingoru: Apr 07, 2026, 10:54 AM UTC
Resolved: Apr 07, 2026, 11:08 AM UTC
Duration: 13m

Affected: Internal Developer Portal (IDP)Internal Developer Portal (IDP)

Timeline · 4 updates

monitoring Apr 07, 2026, 10:54 AM UTC

A fix has been implemented and we are monitoring the results.
monitoring Apr 07, 2026, 10:59 AM UTC

We are continuing to monitor for any further issues.
resolved Apr 07, 2026, 11:08 AM UTC

This incident has been resolved.
postmortem Apr 21, 2026, 12:58 PM UTC

## **Summary** On April 7, 2026, customers using the Internal Developer Portal \(IDP\) in certain production environments experienced service disruption where the IDP UI became inaccessible. Users encountered errors when attempting to access the module. ## **Root Cause** A configuration change introduced during a routine deployment prevented the system from correctly routing incoming requests to the IDP service, resulting in loss of access to the UI. ## **Impact** Customers using IDP in affected environments were unable to access the portal UI during the incident window. Other modules and environments remained unaffected. ## **Remediation** **Immediate:** Rolled back the recent configuration change and restored service routing, which recovered access to the IDP module. **Permanent:** Implemented additional safeguards in the deployment process to validate configuration changes and ensure compatibility before rollout. ## **Action Items** To prevent such issues from happening again, we are taking the following steps. * Enhancing our release process and UI validation * Improve monitoring and alerting for early detection of routing issues

Read the full incident report →

Minor April 3, 2026

Degraded Performance — Pipeline Insights Dashboards

Detected by Pingoru: Apr 03, 2026, 04:53 PM UTC
Resolved: Apr 04, 2026, 01:59 PM UTC
Duration: 21h 5m

Affected: Custom Dashboards

Timeline · 5 updates

investigating Apr 03, 2026, 04:53 PM UTC

We are currently investigating this issue.
identified Apr 03, 2026, 04:59 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Apr 03, 2026, 09:29 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Apr 03, 2026, 09:30 PM UTC

We are continuing to monitor for any further issues.
resolved Apr 04, 2026, 01:59 PM UTC

This incident has been resolved.

Read the full incident report →

Minor April 2, 2026

CI degradation with CI steps using AWS connector with inherited authentication

Detected by Pingoru: Apr 02, 2026, 03:13 PM UTC
Resolved: Apr 02, 2026, 05:53 PM UTC
Duration: 2h 40m

Affected: Continuous Delivery - Next Generation (CDNG)Continuous Delivery - Next Generation (CDNG)

Timeline · 3 updates

investigating Apr 02, 2026, 03:13 PM UTC

We are investigating a degradation in CI steps when using AWS connectors and inherited authentication.
resolved Apr 02, 2026, 05:53 PM UTC

This incident has been resolved.
postmortem Apr 17, 2026, 03:29 PM UTC

## **Summary** On April 2, 2026, customers experienced failures in CI pipelines during S3 upload steps following a routine delegate upgrade. The issue primarily impacted customers using cross-account AWS role assumption with inherit-from-delegate connectors. ## **Impact** Few customers across Prod1 and Prod2 using CI pipelines using S3 upload with cross-account role assumption experienced Artifact uploads failures, blocking downstream deployments ## **Root Cause** A change introduced during the delegate upgrade altered how AWS credentials were passed to CI steps. This resulted in **partial credentials being provided to the S3 upload plugin**, which triggered a latent issue in the plugin’s credential selection logic. Instead of executing the intended cross-account role assumption flow, the plugin attempted authentication using incomplete credentials, leading to failures. ## **Mitigation** * Rolled back delegate to the previous stable version * Restored original credential handling behavior * Service functionality recovered immediately after rollback ## **Next Steps** To prevent such issues from happening again we will: * Improve validation of credential handling in CI steps * Expand automated test coverage for cross-account scenarios * Reintroduce changes behind proper feature flags with full end-to-end testing

Read the full incident report →

Minor April 1, 2026

Autostopping service degraded in AWS (Middle East South 1)

Detected by Pingoru: Apr 01, 2026, 06:44 AM UTC
Resolved: Apr 02, 2026, 08:40 AM UTC
Duration: 1d 1h

Affected: Cloud Cost Management (CCM)Cloud Cost Management (CCM)Cloud Cost Management (CCM)

Timeline · 5 updates

identified Apr 01, 2026, 06:16 AM UTC

The issue has been identified and a fix is being implemented.
identified Apr 01, 2026, 06:44 AM UTC

Status update : CCM AutoStopping functionality for the AWS cloud provider is currently impacted due to increased latency from AWS in the me-south-1 region. This is affecting multiple operations, including warm-up, cool-down, schedule execution, and traffic detection. In addition, CCM Asset Governance functionality is also impacted for resources in the me-south-1 region. We are actively working on isolating/excluding the affected region to restore functionality for the remaining customers. Resources within the me-south-1 region may continue to experience issues until the region fully recovers.
monitoring Apr 01, 2026, 08:20 AM UTC

Update: AutoStopping functionality for AWS has been restored for all regions except me-south-1. The issue was caused by elevated latency from AWS in the affected region, impacting operations such as warm-up, cool-down, schedule execution, and traffic detection. We have now isolated this region to prevent impact on other customers. Resources in me-south-1 will continue to experience the issue until the region fully recovers. We are actively monitoring the situation and will provide further updates as available.
resolved Apr 02, 2026, 08:40 AM UTC

This incident has been resolved.
postmortem Apr 14, 2026, 03:30 AM UTC

The issue was caused by elevated latency from AWS in the affected region, impacting operations such as warm-up, cool-down, schedule execution, and traffic detection. We have now isolated this region to prevent impact on other customers. Resources in me-south-1 will continue to experience the issue until the region fully recovers. We are actively monitoring the situation and will provide further updates as available.

Read the full incident report →

Minor March 25, 2026

Intermittent STO step failure with 500 error

Detected by Pingoru: Mar 25, 2026, 10:00 AM UTC
Resolved: Mar 25, 2026, 12:30 PM UTC
Duration: 2h 30m

Affected: Security Testing Orchestration (STO)

Timeline · 3 updates

investigating Mar 25, 2026, 03:53 PM UTC

We are currently investigating this issue.
resolved Mar 25, 2026, 03:53 PM UTC

This incident has been resolved.
postmortem Mar 26, 2026, 07:53 PM UTC

## **Summary** On March 25, 2026, between approximately **3:30 PM and 6:00 PM IST**, the STO service in the **Prod1 environment** experienced **intermittent failures** while processing scan uploads. This resulted in **step failures for some pipeline executions** during the incident window. ## **Root Cause** During a scheduled internal data backfill activity, the STO service experienced **increased database load**. Concurrently, a recent change in the scan upload processing path introduced additional latency under these conditions. The combination of elevated load and increased query execution time caused some scan upload requests to exceed processing thresholds and fail. Retry attempts further amplified system load, leading to intermittent failures. ## **Impact** * Intermittent **scan upload failures \(500 errors\)** during pipeline execution * Some pipelines experienced **step failures or delays due to retries** * No impact to previously uploaded scan results or other STO functionality ## **Mitigation/Remediation** ### **Immediate** * Stopped the internal backfill activity to reduce database load * Optimized the scan upload processing query ### **Permanent** * Introduced safeguards for background jobs to prevent impact on production workloads * Improved performance of critical database paths * Enhanced monitoring to detect abnormal load and retry amplification earlier ## **Action Items** To prevent such issues from happening again: * Implement throttling and isolation for background/backfill jobs * Add protections for critical request paths under load * Improve alerting on database latency and retry patterns * Strengthen validation for production-like load conditions

Read the full incident report →

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Harness reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Harness alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Harness for free

5 free monitors · No credit card required