Zonos experienced a critical incident on June 13, 2023 affecting Dashboard, lasting 24m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 13, 2023, 07:25 PM UTC
We are currently investigating reports of a potential service interruption with Dashboard. We apologize for any inconvenience and will post another update as soon as we learn more.
- investigating Jun 13, 2023, 07:27 PM UTC
We are continuing to investigate this issue.
- identified Jun 13, 2023, 07:37 PM UTC
An issue with upstream Lambda creation and execution has been identified, and we are waiting on a fix to be rolled out while investigating other mitigation strategies. For more information, see the AWS status at https://health.aws.amazon.com/health/status.
- monitoring Jun 13, 2023, 07:46 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Jun 13, 2023, 07:49 PM UTC
This incident has been resolved.
- postmortem Jun 13, 2023, 10:55 PM UTC
**What products were affected and what was the impact?** Zonos Dashboard Impact: CRITICAL **What timeframe did this issue occur?** | **Date** | **Time** | | --- | --- | | Jun 13, 2023\] | 12:54 to 13:46 MDT | **How was the issue detected?** Internal reports of authorization failures and Dashboard becoming inaccessible. **What functionality was affected?** Zonos Dashboard was not accessible. **What problems did this cause?** Users were unable to access Dashboard to complete tasks. **What was the resolution of the problem and steps that are being taken for continued follow-up?** The issue was identified as an AWS Operational issue in the US-EAST-1 Region impacting an upstream service provider hosting our Front-End services for Dashboard. We were able to redeploy those services to an unaffected region to restore functionality. **What mitigation solutions will we put in place to prevent this issue from occurring in the future?** We are continually assessing and improving business continuity solutions throughout every layer of our tech stack to minimize downtime and automate recovery where possible.