Harness incident

Feature Flags unable to update

Minor Resolved View vendor source →
Started
Apr 15, 2026, 11:55 PM UTC
Resolved
Apr 16, 2026, 04:15 AM UTC
Duration
4h 20m
Detected by Pingoru
Apr 15, 2026, 11:55 PM UTC

Affected components

Feature Flags (FF)

Update timeline

  1. investigating Apr 15, 2026, 11:55 PM UTC

    We are currently investigating this issue.

  2. investigating Apr 16, 2026, 12:01 AM UTC

    We are continuing to investigate this issue.

  3. monitoring Apr 16, 2026, 04:04 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Apr 16, 2026, 06:05 PM UTC

    This incident has been resolved.

  5. postmortem Apr 30, 2026, 05:34 PM UTC

    ## **Summary** On April 15, 2026, between approximately 23:21 UTC and 01:58 UTC, customers using Feature Flag in the prod2 environment experienced delays in feature flag updates. Feature flag changes made via UI or API were successfully processed but were **not immediately reflected**, causing stale flag values to be served. ## **Impact** * **Scope:** Customers on **prod2 environment only** * **Customer Impact:** * Feature flag updates were **delayed or appeared ineffective** * Applications continued serving **stale configurations** * **Other Environments:** No impact to prod0, prod1, or other regions ## **Root Cause** The issue was caused by replication lag in the read replica database used for serving feature flag reads. A **long-running read query** on the replica blocked replication updates from the primary database. This caused a delay in propagating recent feature flag changes to read queries ### **What triggered the issue** * A high-volume API usage pattern involving **large paginated queries on target data.** These queries became **resource-**intensive impacting the database. ## **Mitigation** ### **Immediate Actions** * Identified and **terminated long-running queries** on the replica * Replication resumed and flag updates began reflecting correctly ## **Prevention & Next Steps** We are continuing to strengthen reliability through: * We configured replica to **automatically cancel queries** that block replication beyond a threshold and tuned **query timeouts** for heavy read operations * Improving **query efficiency and pagination strategies** * Enhancing **monitoring and alerting for replication lag** * Evaluating **database upgrades and scaling improvements**

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Harness reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Harness alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Harness for free

5 free monitors · No credit card required