Harness experienced a notice incident on January 28, 2026 affecting Continuous Delivery - Next Generation (CDNG) and Continuous Integration Enterprise(CIE) - Self Hosted Runners and 1 more component, lasting 45s. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring Jan 28, 2026, 06:11 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Jan 28, 2026, 06:12 AM UTC
This incident has been resolved.
- postmortem Feb 11, 2026, 06:04 PM UTC
## **Summary** On January 26, 2026, the Policy Management service in the Prod2 environment experienced a partial outage during a deployment. A database migration attempted to create a new index on a large production table containing more than 70 million records. The index creation caused elevated database load and temporary service instability, resulting in failed policy evaluations. The issue was self-resolved once the index creation completed, and service stability was restored. ## **Impact** During the incident window \(approximately 11:15 AM – 11:28 AM PST\): * Policy evaluations in Prod2 failed intermittently. * Pipelines and deployments dependent on OPA-based policy evaluations failed during this period. * Other database operations unrelated to the affected table continued to function normally. This was a **partial outage limited to policy evaluation functionality** in Prod2. There was **no data loss**, and the impact was confined to the duration of the index creation process. ## **Root Cause** As part of a deployment, a new database index was introduced on a large production table used for policy evaluations. The index creation was executed through an application migration process. Because the table contained a significant volume of records, the index creation operation placed heavy load on the database and temporarily restricted access to the table. This caused the Policy Management service to encounter repeated database connection failures and enter crash-restart loops. ## **Mitigation** Immediate mitigation occurred when: * The index creation completed successfully. * Database load normalized. * Policy Management service instances recovered automatically and resumed normal operation. Post-incident verification confirmed: * Required indexes were properly created. * Database performance metrics returned to normal levels. * Policy evaluation functionality was fully restored. For subsequent production environments, required indexes were created manually and verified prior to deployment to prevent recurrence. ## **Action Items** To prevent similar incidents in the future, the following actions are being implemented: * Enhancing process for creating indexes on large or critical tables outside of application deployment workflows. * Enhance our pre-deployment validation to check for indexes