Elium incident

Loss of internet connectivity

Critical Resolved View vendor source →

Elium experienced a critical incident on December 11, 2020 affecting Private Hosting, lasting 2h 43m. The incident has been resolved; the full update timeline is below.

Started
Dec 11, 2020, 01:33 PM UTC
Resolved
Dec 11, 2020, 04:17 PM UTC
Duration
2h 43m
Detected by Pingoru
Dec 11, 2020, 01:33 PM UTC

Affected components

Private Hosting

Update timeline

  1. investigating Dec 11, 2020, 01:33 PM UTC

    We are currently investigating this issue.

  2. investigating Dec 11, 2020, 01:34 PM UTC

    Instances hosted in our private hosting facility are unreachable because our internet connectivity is down

  3. investigating Dec 11, 2020, 01:35 PM UTC

    We switched our internet connectivity to our backup provider

  4. identified Dec 11, 2020, 01:38 PM UTC

    We had to update our DNS records to point to our backup external IP addresses, depending on the cached value, this might take some minutes to propagate

  5. monitoring Dec 11, 2020, 01:49 PM UTC

    The datacenter has confirmed they have a problem with one of their internet provider, our backup provider is unaffected

  6. monitoring Dec 11, 2020, 02:40 PM UTC

    We are having DNS issues on some of our private hosting facility since the upstream switch

  7. monitoring Dec 11, 2020, 02:43 PM UTC

    Our internal DNS resolver was still set to the failing primary internet line, and has been switched to use our backup line DNS provider

  8. monitoring Dec 11, 2020, 02:54 PM UTC

    We identified another issue related to serving of thumbnail/file contents that should be resolved as soon as the new DNS record propagates

  9. resolved Dec 11, 2020, 04:17 PM UTC

    The upstream provider connectivity has been resumed in our datacenter

  10. postmortem Dec 17, 2020, 09:58 AM UTC

    Vendredi 11/12/2020 – 14 :25 : remontée d’une alarme backbone concernant le switch B19B4530WIN0 et qq autres équipements situés en aval Vendredi 11/12/2020 – 14 :30 : basic troubleshooting – panne electrique supposée Vendredi 11/12/2020 – 15 :10 : arrivée ingénieur au WDC – qq tests effectués sur l’alimentation et les ventilateurs du B19B4530WIN0 Vendredi 11/12/2020 – 15 :20 : tests non probants – nous decidons de remplacer le chassis du B19B4530WIN0. Le B19B4530WIN0 est constitué de 2 chassis en stack et le chassis défectueux est identifié comme étant le C3750-X – disponible en spare backbone au stock à Wierde. Vendredi 11/12/2020 – 15 :30 : sortie du CAT3750-X spare du stock et transfert jusque WDC Vendredi 11/12/2020 – 15 :30 – 16 :15 : détricotage et reperage des connexions UTP se terminant sur le B19B4530WIN0 pour preparer la migration Vendredi 11/12/2020 – 16 :10 : arrivée du switch spare au WDC. Vendredi 11/12/2020 – 16 :15 : configuration du switch spare. Vendredi 11/12/2020 – 16 :40 : remplacement du switch défectueux. Vendredi 11/12/2020 – 17 :00 : formation du stack entre les 2 membres du switch et début du replacement des cables UTP Vendredi 11/12/2020 – 17 :04 : reboot du switch pour configuration du system MTU. Vendredi 11/12/2020 – 17 :07 : fin du replacement des connexions UTP sur le switch spare. Vendredi 11/12/2020 – 17 :07 : fin de l’intervention ROOT-CAUSE : panne hardware du chassis B19B4530WIN0