Harness incident

Traceable APAC Cluster Slowness

Harness experienced a minor incident on May 14, 2026 affecting APAC - app.apac.traceable.ai / api.apac.traceable.ai, lasting 15m. The incident has been resolved; the full update timeline is below.

Started: May 14, 2026, 10:57 AM UTC
Resolved: May 14, 2026, 11:13 AM UTC
Duration: 15m
Detected by Pingoru: May 14, 2026, 10:57 AM UTC

Affected components

APAC - app.apac.traceable.ai / api.apac.traceable.ai

Update timeline

investigating May 14, 2026, 10:57 AM UTC

We are currently investigating this issue.
identified May 14, 2026, 11:00 AM UTC

We have identified a potential issue causing the service access problem and are working hard to address it. Please continue to monitor this page for updates.
resolved May 14, 2026, 11:13 AM UTC

We can confirm normal operation, Slowness issue is resolved. We will publish the RCA
postmortem May 22, 2026, 08:37 PM UTC

## Summary Between May 17 and May 19, 2026, customers in the APAC region experienced intermittent unresponsiveness with the Harness platform. The incident occurred twice within a two-day window. In both cases, the GraphQL service became degraded, causing requests to fail or time out for affected users. The service was restored each time by restarting the impacted pods. A permanent alerting fix is in progress and expected to reach production shortly. ‌ ## Impact * APAC-region customers experienced intermittent inability to access the Harness platform UI and API * No data loss was reported; the impact was limited to service availability ‌ ## Root Cause One of the two service pods in the APAC cluster entered a degraded state where downstream calls were timing out. While the downstream services showed no visible increase in response time on their end, the pod was unable to receive responses from them in time likely due to a network-level communication issue between the pod and its downstream dependencies. ‌ ## Mitigation Identified and restarted problematic nodes. Recovery was confirmed by the engineering team before the incident was marked resolved. ## Next Steps **Improved Alerting and Proactive Detection** — Implemented new alerts for elevated response times to ensure any recurrence is detected and mitigated before impacting customers.