Apwide incident

Performance issue on Golive

Apwide experienced a major incident on February 5, 2026 affecting Golive Cloud - App and Golive Cloud - API and 1 more component, lasting 7h 10m. The incident has been resolved; the full update timeline is below.

Started: Feb 05, 2026, 07:46 AM UTC
Resolved: Feb 05, 2026, 02:56 PM UTC
Duration: 7h 10m
Detected by Pingoru: Feb 05, 2026, 07:46 AM UTC

Affected components

Golive Cloud - AppGolive Cloud - APIGolive Cloud - Email NotificationsGolive Cloud - Automations & Webhooks

Update timeline

investigating Feb 05, 2026, 07:46 AM UTC

We are currently experiencing a performance issue on Golive that is making the platform inaccessible. We are actively investigating the root cause. We apologize for the inconvenience.
investigating Feb 05, 2026, 08:14 AM UTC

We are continuing to investigate this issue.
investigating Feb 05, 2026, 08:23 AM UTC

We have disabled traffic from Atlassian webhooks (issue created/updated/deleted events). As a result, Go-Live automation and scheduling conflict checks are currently unavailable. We are continuing to investigate the issue.
investigating Feb 05, 2026, 10:18 AM UTC

Disabling webhook traffic has restored Golive for app navigation. We are currently working on performance fixes so that we can re-enable webhooks (Golive automation and the Scheduling Conflict Checker). We will provide an update soon.
identified Feb 05, 2026, 11:20 AM UTC

A fix has been implemented and deployed to production. We are reopening webhook traffic to assess whether the solution meets performance requirements. This may result in some instability over the next hour.
identified Feb 05, 2026, 11:44 AM UTC

Although the fix helped mitigate the performance issue on one of our components, it is not sufficient to handle the current webhook traffic load. We are continuing to work on performance improvements, but in the meantime, we have had to disable webhook traffic again.
monitoring Feb 05, 2026, 02:01 PM UTC

A second fix has been deployed to production. Webhook traffic has been re-enabled, and the load appears to be handled correctly. Capabilities such as conflict checking and automation seem to be fully functional. We are continuing to monitor the situation.
resolved Feb 05, 2026, 02:56 PM UTC

Over the past hour, we have experienced several traffic peaks on Atlassian webhook calls, and our updated systems have handled the load without any issues. We now consider this issue to be resolved. We apologize for the inconvenience caused.
postmortem Feb 05, 2026, 02:58 PM UTC

**07:23 UTC** – The support team was alerted by monitoring that some database queries were taking longer than expected. **07:34 UTC** – Our uptime probes began raising alerts indicating that Golive was no longer accessible to end users. **07:46 UTC** – The support team updated the status page to notify users of a Golive outage. **08:23 UTC** – The support team identified an abnormal traffic spike originating from Atlassian webhooks. Under normal conditions, traffic peaks reach around 1,000 webhook calls per minute; however, we experienced a sustained spike exceeding 9,000 calls per minute. This caused one of our components to open more database connections than it was designed to handle, forcing it to restart repeatedly. To protect the Golive application, webhook traffic was temporarily disabled. The component stopped restarting and access to Golive was restored. **11:20 UTC** – The engineering team implemented an initial performance improvement on the affected component. The fix was deployed to production, and webhook traffic was re-enabled to evaluate performance. **11:44 UTC** – Webhook traffic was suspended again as the initial fix proved insufficient. Although database connections were no longer an issue, the component was still receiving excessive request traffic. **14:01 UTC** – A second improvement was implemented, this time at the gateway level. The fix was deployed to production and webhook traffic was re-enabled. No further restarts were observed, and Golive appeared to be fully operational. **14:45 UTC** – After several traffic peaks reaching up to 5,000 webhook calls per minute, all components continued to operate normally, with sufficient margins in terms of database connections and execution threads. The incident was closed. With the additional performance improvements, the infrastructure now appears capable of handling a significantly higher load than originally designed. Furthermore, traffic should be better segregated to isolate asynchronous workloads \(webhooks, conflict checking, automation, etc.\) from transactional traffic \(UI application usage\). This separation should help mitigate the risk of asynchronous traffic impacting normal user navigation.