postmortem Feb 05, 2026, 02:58 PM UTC
**07:23 UTC** – The support team was alerted by monitoring that some database queries were taking longer than expected. **07:34 UTC** – Our uptime probes began raising alerts indicating that Golive was no longer accessible to end users. **07:46 UTC** – The support team updated the status page to notify users of a Golive outage. **08:23 UTC** – The support team identified an abnormal traffic spike originating from Atlassian webhooks. Under normal conditions, traffic peaks reach around 1,000 webhook calls per minute; however, we experienced a sustained spike exceeding 9,000 calls per minute. This caused one of our components to open more database connections than it was designed to handle, forcing it to restart repeatedly. To protect the Golive application, webhook traffic was temporarily disabled. The component stopped restarting and access to Golive was restored. **11:20 UTC** – The engineering team implemented an initial performance improvement on the affected component. The fix was deployed to production, and webhook traffic was re-enabled to evaluate performance. **11:44 UTC** – Webhook traffic was suspended again as the initial fix proved insufficient. Although database connections were no longer an issue, the component was still receiving excessive request traffic. **14:01 UTC** – A second improvement was implemented, this time at the gateway level. The fix was deployed to production and webhook traffic was re-enabled. No further restarts were observed, and Golive appeared to be fully operational. **14:45 UTC** – After several traffic peaks reaching up to 5,000 webhook calls per minute, all components continued to operate normally, with sufficient margins in terms of database connections and execution threads. The incident was closed. With the additional performance improvements, the infrastructure now appears capable of handling a significantly higher load than originally designed. Furthermore, traffic should be better segregated to isolate asynchronous workloads \(webhooks, conflict checking, automation, etc.\) from transactional traffic \(UI application usage\). This separation should help mitigate the risk of asynchronous traffic impacting normal user navigation.