Umbrellar incident

Subject: Border Router temporary performance degradation

Notice Resolved View vendor source →

Umbrellar experienced a notice incident on October 22, 2020, lasting —. The incident has been resolved; the full update timeline is below.

Started
Oct 22, 2020, 03:38 AM UTC
Resolved
Oct 21, 2020, 07:30 AM UTC
Duration
Detected by Pingoru
Oct 22, 2020, 03:38 AM UTC

Update timeline

  1. resolved Oct 22, 2020, 03:38 AM UTC

    Summary A combination of the routing daemon process crashing on the border router and high CPU utilization as the router rebuilt the global routing table from our multiple upstream providers. End User Impact Temporary degradation of external access to customers internet facing web services. Incident Root Cause Software & hardware issue with internet border routers. Incident Resolution Software automatically restarted and rebuilt routing tables, although this a time-consuming process and the hardware was under load for fifteen minutes while this occurred. Timeline 20:57 Route process died on router and automatically re-started 21:03 Parings with upstream providers was re-established. 21:03 – 21:20 Route table rebuild and upstream distribution of routes. What we learned Automated self-correction worked as expected. The incident surfaced an issue within the Junos software that needs to be investigated. Planned Actions Logged ticket with hardware vendor Upgrade of software for the border router Recommendations • Upgrade of software for the border router. • Confirmation from hardware vendor that upgrade resolves issue.