Nebula incident

Degraded Performance Across Calls, Apps and Dashboards

Nebula experienced a minor incident on July 9, 2025 affecting Core Network and Dashboard and 1 more component, lasting 21h 38m. The incident has been resolved; the full update timeline is below.

Started: Jul 09, 2025, 10:24 AM UTC
Resolved: Jul 10, 2025, 08:03 AM UTC
Duration: 21h 38m
Detected by Pingoru: Jul 09, 2025, 10:24 AM UTC

Affected components

Core NetworkDashboardMobile ApplicationsDesktop Applications

Update timeline

investigating Jul 09, 2025, 10:24 AM UTC

Our NOC are investigating reports of degraded performance across various elements of our platform, including calls, softphone apps and load speeds with our online dashboards. We will provide a further update in 30 minutes.
identified Jul 09, 2025, 10:40 AM UTC

The affected traffic is being rerouted and we're beginning to see performance in those areas improve. We are continuing to monitor and will provide a further update in the next 30 minutes.
monitoring Jul 09, 2025, 10:51 AM UTC

Service has been restored in most areas and continues to improve in some niche cases. Our Team are still monitoring the incident and will mark this as resolved after a successful period. More information will be provided later via Post Mortem on this status.
identified Jul 09, 2025, 11:45 AM UTC

We're investigating further reports of similar issues and will provide a further update within 30 minutes
monitoring Jul 09, 2025, 12:07 PM UTC

Normal service has resumed again and our team continue to closely monitor the situation.
identified Jul 09, 2025, 12:56 PM UTC

We are seeing degraded performance in some areas and working on the fix as we speak.
monitoring Jul 09, 2025, 01:17 PM UTC

A fix has been implemented and we are monitoring the situation.
resolved Jul 10, 2025, 08:03 AM UTC

This is confirmed as resolved, and a Post Mortem will be provided ASAP via this status page.
postmortem Jul 11, 2025, 09:22 AM UTC

### **What Happened** At approximately 11:20 am BST on 9th July, we identified a sudden and unprecedented surge in traffic through our network, which resulted in a build up of messages entering the platform. This created a rapid slowdown of elements of the platform that rely on this technology; namely call authentication, softphone app status and chat functions, and dashboard navigation. ### **Impact** While we have rigorous monitoring and load balancing protections in place, the rate at which this occurred meant that our usual redundancy measures were not able to sufficiently redirect all offending traffic and this ultimately led to a degradation of service which was seen across a significant number of accounts. Our team worked throughout the day to both mitigate the impact on our end users, and also investigate and identify the root cause. We systematically suspended various elements of our network while we investigated the issue, and each time were periodically restarting affected servers which cleared the backlog instantly, however shortly after each restart the traffic returned to unprecedented levels via different channels. Unfortunately this led to a number of ‘false positives’ on the root cause, leading our team to incorrectly and prematurely confirm a resolution via our status page. We ultimately identified the root cause as a specific customer’s misconfiguration of an ancillary service to our main product line. Unlike with our own software, the protections around a complex integration with this third-party service rely in part on safeguards implemented by the other party; in this case, we found there was insufficient rate limiting on their side, which allowed an inflated level of traffic into the Nebula platform. This traffic came from various origins and made it harder for our team to quickly identify the root cause. **Resolution** After identifying the origin of the issue, we immediately suspended all services surrounding the account and integration. Once incoming traffic returned to normal levels, the backlog was immediately cleared and normal service resumed within minutes. We then worked with the service provider to establish the cause on their end, and implemented further checks within our own platform to prevent it recurring. **Next Steps** We have since begun a comprehensive review of all aspects of the connecting infrastructure, and will be implementing further improvements to allow us to better monitor and identify any similar issues in future. This also includes a more comprehensive review of our complete technology stack to ensure we are sufficiently protected for other cases of a similar nature. We would like to sincerely apologise for the disruption caused, and thank you for your patience and understanding.