Nebula incident

Post Dial Delay on calling

Minor Resolved View vendor source →

Nebula experienced a minor incident on March 11, 2025 affecting Core Network, lasting 1d 1h. The incident has been resolved; the full update timeline is below.

Started
Mar 11, 2025, 09:32 AM UTC
Resolved
Mar 12, 2025, 11:30 AM UTC
Duration
1d 1h
Detected by Pingoru
Mar 11, 2025, 09:32 AM UTC

Affected components

Core Network

Update timeline

  1. investigating Mar 11, 2025, 09:32 AM UTC

    Our engineers are investigating reports of delay and silence on calls, affecting a subset of customers. Further updates will be provided within the next 20 minutes.

  2. investigating Mar 11, 2025, 09:42 AM UTC

    We are continuing to investigate the issue, but note this is also extended to the ability to answer some calls.

  3. identified Mar 11, 2025, 09:47 AM UTC

    Our engineers have identified the issue and are working towards a fix ASAP. Next update within the next 20 mins.

  4. identified Mar 11, 2025, 10:12 AM UTC

    We are continuing to work on a fix and will provide a further update ASAP.

  5. identified Mar 11, 2025, 10:37 AM UTC

    Work is ongoing and a further update will be provided within the next 30 mins.

  6. monitoring Mar 11, 2025, 10:50 AM UTC

    Traffic has been returning to normal levels in most areas of our network. We'll continue to monitor and provide further updates within one hour.

  7. monitoring Mar 11, 2025, 11:52 AM UTC

    Our NOC will continue to monitor the incident over the next 24 hours, before marking as resolved.

  8. monitoring Mar 12, 2025, 11:04 AM UTC

    After a period of successful monitoring we are marking this incident as resolved. A Post Mortem will be published within 24 hours.

  9. resolved Mar 12, 2025, 11:30 AM UTC

    This incident has been resolved.

  10. postmortem Mar 13, 2025, 09:36 AM UTC

    At 09:10am UTC, our monitoring systems detected a sudden and substantial increase in SIP traffic directed towards our platform. Analysis revealed that the surge originated from an unprecedented volume of SIP requests, generated by malfunctioning equipment in a specific area of our infrastructure designed for large-scale deployments. The excessive packet rate created a cascading effect; the initial overload caused delays in call processing, in turn triggering further retries from the affected endpoints, exacerbating the problem. This flood of traffic pushed our servers to full capacity within minutes, and the Nebula platform immediately re-routed traffic to unaffected servers allowing around 70% of our services to operate unaffected. While the platform has resilient fault-tolerance processes, the remaining servers \(and consequentially any end users connecting to those servers at random\) were unfortunately disrupted while we worked to address the underlying cause. This resulted in performance degradation including increased call latency, dropped calls, and potential service unavailability for users connecting to the affected infrastructure. We took immediate corrective measures, which involved identifying, isolating and then blocking traffic from the affected zones. Service was restored for the users connected to the affected servers as fast as we reasonably could without running the risk of further disruption. A subsequent 24 hour monitoring period ensured no recurrence, allowing us to investigate and confirm the nature of the fault. In the wake of the incident, we have significantly improved our monitoring systems to isolate and address similar symptoms before they cause any disruption, and as an additional precaution have now implemented several methods to identify the characteristic and frequency of those types of packets and isolate that traffic from the voice element of our infrastructure. The implemented corrective and preventative measures will significantly reduce the risk of similar incidents in the future. We are committed to providing reliable and high-quality voice to our partners and customers, and will continue to review and improve our platform's resilience and performance.