Nebula incident

CallSwitch and CallSwitchOne platform experiencing calls down.

Major Resolved View vendor source →

Nebula experienced a major incident on May 5, 2025 affecting Core Network, lasting 54m. The incident has been resolved; the full update timeline is below.

Started
May 05, 2025, 03:00 PM UTC
Resolved
May 05, 2025, 03:55 PM UTC
Duration
54m
Detected by Pingoru
May 05, 2025, 03:00 PM UTC

Affected components

Core Network

Update timeline

  1. investigating May 05, 2025, 03:00 PM UTC

    We have found the root-cause of the issue. Fixing in-progress Next update in 5minutes..

  2. investigating May 05, 2025, 03:08 PM UTC

    We are continuing to investigate this issue.

  3. investigating May 05, 2025, 03:20 PM UTC

    We are still experiencing issues with outbound calls. The issue is under investigation. Next update shortly.

  4. investigating May 05, 2025, 03:45 PM UTC

    We are continuing to investigate this issue.

  5. resolved May 05, 2025, 03:55 PM UTC

    The issue is now resolved. All the services are up and running now.

  6. postmortem May 07, 2025, 02:11 PM UTC

    **What Happened** At 03:10 PM BST, our monitoring systems detected connectivity issues between the database and Nebula platform which occurred due to an unexpected network refresh. As the network reconnected, the database encountered an authentication error. This caused a connection issue for inbound and outbound calls for 90mins. **Impact** The inbound and outbound calls were impacted and couldn't be connected through the platform. **Resolution** We took immediate corrective measures, which involved identifying the root-cause of the issue \(DNS related\). Services were restored within 90mins with no further delays encountered in the call flow. A subsequent 24-hour monitoring period ensured no recurrence, allowing us to investigate and confirm the nature of the fault. In the wake of the incident, we have implemented additional DNS and connectivity health-checks which run every minute. **Next Steps** The implemented corrective and preventative measures will significantly reduce the risk of similar incidents in the future. We are committed to providing reliable and high-quality voice to our partners and customers and will continue to review and improve our platform's resilience and performance.