Alkira experienced a minor incident on May 31, 2023 affecting CACENTRAL-AZURE-1 (Toronto) and CAEAST-AZURE-1 (Quebec City) and 1 more component, lasting 21m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 31, 2023, 08:50 PM UTC
We are investigating an issue with Azure regions where the tunnel health reporting service fails.
- investigating May 31, 2023, 08:53 PM UTC
We see that provisioning services in these regions are impacted as well.
- investigating May 31, 2023, 08:55 PM UTC
We are continuing to investigate this issue.
- investigating May 31, 2023, 09:02 PM UTC
We are actively working on recovering the services, and we expect to see recovery soon.
- investigating May 31, 2023, 09:08 PM UTC
All the services should have been recovered now. The health of the connectors should be restored to their correct state on the topology.
- investigating May 31, 2023, 09:11 PM UTC
We are continuing to investigate this issue.
- resolved May 31, 2023, 09:12 PM UTC
We have resolved the issue now and are actively monitoring the services. We will post an RCA on this issue soon.
- postmortem May 31, 2023, 09:13 PM UTC
At approximately 20:40 UTC on May 31st, we noticed an increase in workload on one of our infrastructure clusters that are serving USCENTRAL-AZURE-3, USEAST-AZURE-2, CACENTRAL-AZURE-1, CAEAST-AZURE-1, USEAST-AZURE-1 CXP regions. Health reporting and provisioning services were impacted as part of this increased workload. We quickly added more nodes to the infrastructure cluster to remediate and recover the failing services at 21:10 UTC. We don't anticipate this to occur again and are actively reviewing all other regions for any spike in workload. Please reach out to Alkira Support if you have any questions.