Harness incident

[Prod-8] Degraded access to the login page

Minor Resolved View vendor source →

Harness experienced a minor incident on March 3, 2026 affecting Platform, lasting 6h 8m. The incident has been resolved; the full update timeline is below.

Started
Mar 03, 2026, 10:32 AM UTC
Resolved
Mar 03, 2026, 04:40 PM UTC
Duration
6h 8m
Detected by Pingoru
Mar 03, 2026, 10:32 AM UTC

Affected components

Platform

Update timeline

  1. investigating Mar 03, 2026, 10:32 AM UTC

    We are currently investigating this issue.

  2. monitoring Mar 03, 2026, 10:43 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Mar 03, 2026, 04:40 PM UTC

    This incident has been resolved.

  4. postmortem Mar 04, 2026, 07:20 PM UTC

    ## **Summary** On **March 2**, the **prod8 environment** became temporarily inaccessible due to a config issue during a platform deployment. The issue affected ingress routing for the platform UI, resulting in HTTP **404 responses** when users attempted to access the environment. The issue was quickly identified as an ingress configuration problem. A temporary mitigation was applied by updating the ingress configuration, which immediately restored access. A permanent fix is being implemented to prevent recurrence. ## **Root Cause** ‌ The issue was caused by a **service config** that incorrectly generated ingress configuration during deployment. This caused the ingress controller to misroute incoming requests that did not match the expected path. As a result, these requests were directed to the default backend and returned **404 responses**.The problem was isolated to the ingress routing layer. Network connectivity and the Google Cloud Network Load Balancer were functioning normally ## **Impact** * **Affected Environment:** prod8 * **Customer Impact:** Users were unable to access the platform UI and received HTTP 404 responses. * **Scope:** Limited to the specific environment impacted by the ingress configuration change. ## **Resolution** ‌ Engineering teams applied a temporary mitigation by **patching the platform-ui ingress configuration in production to remove the incorrect host entries**. This restored correct routing behavior and resolved the accessibility issue. Access to the prod8 environment was fully restored after the ingress configuration update. ## **Prevention and Improvements** To prevent recurrence of this issue, the following steps are underway: * Adding additional validation checks to ensure ingress configuration is rendered correctly during deployment. * Improving deployment testing for ingress routing scenarios to detect configuration regressions earlier. These improvements will ensure that similar misconfigurations are caught before reaching production environments.