GearHost incident

Intermittent Network Issues

Major Resolved View vendor source →

GearHost experienced a major incident on October 13, 2020 affecting DEN1 and CloudSites and 1 more component, lasting 9h 22m. The incident has been resolved; the full update timeline is below.

Started
Oct 13, 2020, 07:30 AM UTC
Resolved
Oct 13, 2020, 04:52 PM UTC
Duration
9h 22m
Detected by Pingoru
Oct 13, 2020, 07:30 AM UTC

Affected components

DEN1CloudSitesDatabasesDNSEmail

Update timeline

  1. investigating Oct 13, 2020, 10:22 AM UTC

    Our engineers are looking into the issues some users are experiencing. We will provide an update as we get them.

  2. identified Oct 13, 2020, 10:34 AM UTC

    The issue has been identified. It is at the ISP level. We are currently awaiting more information. We will provide an ETA when possible.

  3. monitoring Oct 13, 2020, 11:08 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Oct 13, 2020, 04:52 PM UTC

    The implemented fix has resolved the overall issue. We will provide a post mortem once we gather all information. Thank you all for your patience and understanding. We are working internally and with our partners to ensure this does not happen again.

  5. postmortem Oct 19, 2020, 12:19 AM UTC

    ## Situation Summary: On Tuesday Oct 13th at approximately 3 AM MST GearHost began experiencing a routing blackhole through our upstream provider \(Zayo IP\). This was due to an equipment failure on the regional transit providers router which was outside of GearHost’s control. Due to the nature of the failure, routing protocols did not recognize the outage and traffic destined to and from the internet was blackholed in the peer router. Once the issue was identified, GearHost de-peer’d from the affected upstream at all peering points which restored regular traffic for GearHost customers. ## Customer Impact: Partial or total loss of connectivity ## Mitigation Strategy: This outage was due to traffic being dropped by a routing peer while maintaining routing protocols. Normally dynamic routing would detect an outage and internet traffic would route around it. When an issue with an upstream provider drops or “blackholes” traffic without sending routing updates, the only way to mitigate it is to identify where the traffic is being dropped and manually intervene with routing changes at our borders. ## Activity that ultimately restored service: Depeering with Zayo IP at the blendwidth border routers. ## Identified Root Cause: Upstream provider equipment failure dropping traffic while maintaining dynamic routing protocols