Harness incident

Issue Affecting Feature Flag Updates - PROD2

Minor Resolved View vendor source →

Started: Feb 04, 2026, 12:53 PM UTC
Resolved: Feb 04, 2026, 01:28 PM UTC
Duration: 35m
Detected by Pingoru: Feb 04, 2026, 12:53 PM UTC

Affected components

Feature Flags (FF)

Update timeline

investigating Feb 04, 2026, 12:53 PM UTC

We are currently investigating this issue.
identified Feb 04, 2026, 12:54 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Feb 04, 2026, 12:56 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 04, 2026, 01:28 PM UTC

This incident has been resolved.
postmortem Feb 16, 2026, 06:05 PM UTC

## **Summary** On Feb 3rd, 2026, customers using the Feature Flags module in the production environment \(Prod2\) observed delays in seeing their updates reflected in the user interface. The issue was caused by lag in a database read replica, which resulted in stale data being served for read operations. The issue was identified, mitigated, and fully resolved. ## **Impact** During the incident window: * Updates made to Feature Flags Classic in Prod2 were not immediately reflected in the UI. * Read operations returned stale data due to replication lag between the primary database and a read replica. * All Feature Flags Classic customers in Prod2 were affected. There was **no data loss**, and write operations continued to be processed successfully. The impact was limited to delayed visibility of updates and temporary confusion regarding the status of recent changes. Overall service availability was slightly degraded during the incident. ## **Root Cause** Feature Flags Classic relies on a primary database for write operations and read replicas for read operations. During the incident, a long-running database query caused a read replica to fall significantly behind the primary database. ‌ As a result, while customer updates were successfully written to the primary database, reads served from the lagging replica returned outdated data. Because replication lag alerts were not enabled at the time, the issue was not detected immediately through automated monitoring. ## **Mitigation** As immediate mitigation steps: * The long-running query on the affected read replica was terminated. * The replica was restarted, allowing it to catch up with the primary database and resume normal operation. These actions restored data consistency between the primary and replica databases and resolved the customer-facing impact. ## **Action Items** To reduce the risk of recurrence and improve detection, the following actions are being implemented: * Enable proactive monitoring and alerting for database replication lag with defined thresholds. * Configure query timeouts to prevent long-running queries from impacting database replicas. * Establish clearer operational guidelines for executing resource-intensive queries. * Review and periodically validate database alerting configurations to ensure early detection of similar issues.

Looking to track Harness downtime and outages?

Pingoru polls Harness's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Harness reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Harness alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Harness for free

5 free monitors · No credit card required