Section experienced a critical incident on May 11, 2021 affecting Console and API, lasting 13h 6m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 11, 2021, 02:35 AM UTC
Section portal is currently down. We are currently investigating.
- investigating May 11, 2021, 03:40 AM UTC
Access to the management portal has been restored. We are still investigating issues with the API functionality. Metrics/Logs are not available at the moment.
- investigating May 11, 2021, 04:11 AM UTC
We are continuing to restore the system. We are currently working on restoring the availability of logs and the cache clear API.
- investigating May 11, 2021, 04:52 AM UTC
Service to logs and metrics is being restored. They may take some time to appear in the portal. We are still investigating the cache clear API.
- investigating May 11, 2021, 05:43 AM UTC
We are continuing to resolve the issue.
- investigating May 11, 2021, 06:14 AM UTC
We are continuing to work on resolving the cache clear API availability.
- investigating May 11, 2021, 06:31 AM UTC
Cache clear API has been restored and is operational.
- investigating May 11, 2021, 07:12 AM UTC
We are continuing to work on restoring the portal.
- monitoring May 11, 2021, 07:47 AM UTC
We have implemented workarounds for the issue and are currently monitoring the system.
- resolved May 11, 2021, 03:42 PM UTC
This incident has been resolved.
- postmortem May 11, 2021, 09:23 PM UTC
An error in routine maintenance terminated some backend and user facing infrastructure. The rollback procedure failed, so engineers intervened to bring back system components in a priority order based on our DR planning. Mission critical components, such as traffic delivery POPs, were unaffected during the incident.