Umbrellar incident
Subject: Border Router temporary performance degradation
Umbrellar experienced a notice incident on October 22, 2020, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Oct 22, 2020, 03:38 AM UTC
Summary A combination of the routing daemon process crashing on the border router and high CPU utilization as the router rebuilt the global routing table from our multiple upstream providers. End User Impact Temporary degradation of external access to customers internet facing web services. Incident Root Cause Software & hardware issue with internet border routers. Incident Resolution Software automatically restarted and rebuilt routing tables, although this a time-consuming process and the hardware was under load for fifteen minutes while this occurred. Timeline 20:57 Route process died on router and automatically re-started 21:03 Parings with upstream providers was re-established. 21:03 – 21:20 Route table rebuild and upstream distribution of routes. What we learned Automated self-correction worked as expected. The incident surfaced an issue within the Junos software that needs to be investigated. Planned Actions Logged ticket with hardware vendor Upgrade of software for the border router Recommendations • Upgrade of software for the border router. • Confirmation from hardware vendor that upgrade resolves issue.