Section incident

Errors delivered for a subset of requests in Sydney region

Major Resolved View vendor source →

Section experienced a major incident on March 29, 2021 affecting Sydney, lasting 28m. The incident has been resolved; the full update timeline is below.

Started
Mar 29, 2021, 12:47 AM UTC
Resolved
Mar 29, 2021, 01:15 AM UTC
Duration
28m
Detected by Pingoru
Mar 29, 2021, 12:47 AM UTC

Affected components

Sydney

Update timeline

  1. investigating Mar 29, 2021, 12:47 AM UTC

    We are currently investigating this issue.

  2. identified Mar 29, 2021, 12:56 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 29, 2021, 01:04 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. identified Mar 29, 2021, 01:09 AM UTC

    The issue has been identified and a fix is being implemented.

  5. monitoring Mar 29, 2021, 01:13 AM UTC

    A fix has been implemented and we are monitoring the results.

  6. resolved Mar 29, 2021, 03:43 AM UTC

    This incident has been resolved. Error rates return to normal and affected PoPs are returned to service

  7. postmortem Mar 30, 2021, 08:03 PM UTC

    **Incident Root Cause and Corrective Summary** Network connectivity was impacted between Section’s Sydney PoPs and AWS/Azure services. This resulted in errors being served for cache misses for customers with AWS and Azure hosted origins. As a result, the affected PoPs were removed from Section's delivery network and the platform started directing customer traffic away from the impacted PoPs. The problematic route between Sydney and AWS and Azure networks was failing intermittently which allowed the failure rates to stay beneath the threshold for automatic route removal. Once the unhealthy route impacting connectivity to AWS/Azure networks was disabled, the affected PoPs were returned to service and traffic was restored.