Section incident

Section Portal Down

Critical Resolved View vendor source →

Section experienced a critical incident on May 11, 2021 affecting Console and API, lasting 13h 6m. The incident has been resolved; the full update timeline is below.

Started
May 11, 2021, 02:35 AM UTC
Resolved
May 11, 2021, 03:42 PM UTC
Duration
13h 6m
Detected by Pingoru
May 11, 2021, 02:35 AM UTC

Affected components

ConsoleAPI

Update timeline

  1. investigating May 11, 2021, 02:35 AM UTC

    Section portal is currently down. We are currently investigating.

  2. investigating May 11, 2021, 03:40 AM UTC

    Access to the management portal has been restored. We are still investigating issues with the API functionality. Metrics/Logs are not available at the moment.

  3. investigating May 11, 2021, 04:11 AM UTC

    We are continuing to restore the system. We are currently working on restoring the availability of logs and the cache clear API.

  4. investigating May 11, 2021, 04:52 AM UTC

    Service to logs and metrics is being restored. They may take some time to appear in the portal. We are still investigating the cache clear API.

  5. investigating May 11, 2021, 05:43 AM UTC

    We are continuing to resolve the issue.

  6. investigating May 11, 2021, 06:14 AM UTC

    We are continuing to work on resolving the cache clear API availability.

  7. investigating May 11, 2021, 06:31 AM UTC

    Cache clear API has been restored and is operational.

  8. investigating May 11, 2021, 07:12 AM UTC

    We are continuing to work on restoring the portal.

  9. monitoring May 11, 2021, 07:47 AM UTC

    We have implemented workarounds for the issue and are currently monitoring the system.

  10. resolved May 11, 2021, 03:42 PM UTC

    This incident has been resolved.

  11. postmortem May 11, 2021, 09:23 PM UTC

    An error in routine maintenance terminated some backend and user facing infrastructure. The rollback procedure failed, so engineers intervened to bring back system components in a priority order based on our DR planning. Mission critical components, such as traffic delivery POPs, were unaffected during the incident.