I.T Communications Limited incident

Core Router Outage

Critical Resolved View vendor source →

I.T Communications Limited experienced a critical incident on July 24, 2020 affecting Volta Juniper MX Router, lasting 22h 13m. The incident has been resolved; the full update timeline is below.

Started
Jul 24, 2020, 12:16 PM UTC
Resolved
Jul 25, 2020, 10:29 AM UTC
Duration
22h 13m
Detected by Pingoru
Jul 24, 2020, 12:16 PM UTC

Affected components

Volta Juniper MX Router

Update timeline

  1. investigating Jul 24, 2020, 12:16 PM UTC

    Our alerrting system was just triggered as we appear to have lost conenction to our core router at Volta Data Centre. Services should not be affected as traffic will re-route to our other router at Sov House. Update to follow.

  2. monitoring Jul 24, 2020, 12:29 PM UTC

    The outage of the router was cuased due to London Links Network going down and coming back up every 10 seconds. The rotuer was trying to route around the issue and then it would come back up and the same process started again. London Links is now back up and we can see peering sessions establish. We will monitor network to see if this issue returns.

  3. monitoring Jul 24, 2020, 01:36 PM UTC

    We are seeing a issue with this router dropping packets randomly. its been up for a long time without a reboot. To safe gard the network and to ensure the network remains stable. We will need to do an emergnecy restart of the core router at Volta. I understand its not ideal to do this during office hours. However, I would prefer customers to have a fault free service sooner then wait for out of hours reboot. The Router will go down for 5-10 minutes, traffic will move over to the other router until its back up.

  4. investigating Jul 24, 2020, 03:25 PM UTC

    The reboot of the Router did not resolve the issue. We have therefore removed the router from its core routing of traffic and continue to investigate the issue. Whilst the router has been taken out of service, services should now continue but be considered at risk with no network redudancy. If its proven to be a router hardware fault, we will replace the router.

  5. resolved Jul 25, 2020, 10:29 AM UTC

    Yesterday we removed the router from service and the network returned to normal. Before lockdown we had plans to replace the existing routers and switches which had been in service since 2016. This was then put on hold due to lock down, however due to the recent router issue, we are going to bring these plans forward and implement next week. We will make an announcement next week with our plans. We will close this report and open a new one next week.