Zonos incident

Cloud provider issues impacting label creation

Major Resolved View vendor source →

Zonos experienced a major incident on December 7, 2021, lasting 4h 24m. The incident has been resolved; the full update timeline is below.

Started
Dec 07, 2021, 05:57 PM UTC
Resolved
Dec 07, 2021, 10:21 PM UTC
Duration
4h 24m
Detected by Pingoru
Dec 07, 2021, 05:57 PM UTC

Update timeline

  1. monitoring Dec 07, 2021, 05:57 PM UTC

    We are aware of issues with our upstream cloud provider that may be affecting our services, specifically the label creation process. The issues appear to be intermittent and our cloud provider is working on a fix for the problem. We are continuing to monitor the situation and will provide regular updates.

  2. monitoring Dec 07, 2021, 08:00 PM UTC

    Our upstream cloud provider has identified the issue and is working on a fix. There is currently no ETA for resolution. We are looking into a temporary solution to improve the issues with the label creation process and will provide regular updates.

  3. resolved Dec 07, 2021, 10:21 PM UTC

    Our team has been tracking errors that have occurred during the label creation process and has not identified any errors for more than 30 minutes. Our upstream cloud provider is still working on resolving the underlying issue, but the impact to our services appears to be resolved. We will continue to monitor to ensure that that our services are running smoothly.

  4. postmortem Dec 08, 2021, 08:15 PM UTC

    **What products were affected and what was the impact?** The outage mainly impacted Dashboard label creation and retrieval. Impact: `Major` **What timeframe did this issue occur?** | **Date** | **Time** | | --- | --- | | Dec 8, 2021 | 08:35 - 14:20 MST | **How was the issue detected?** Our team was notified via customer support of a possible issue with shipment label creation. We verified that this was due to our upstream provider experiencing increased API error rates. **What problems did this cause?** Merchants were unable to create shipments for their orders and fulfill them. As it wasn’t a complete outage with the shipment API, some merchants were able to mitigate the issue by re-creating the shipment. **What was the resolution of the problem and steps that are being taken for continued follow-up?** Our team started moving towards use of an alternate cloud storage option but noticed decreased error rates at that time. We analyzed the failed labels and reported those affected to our Customer Success team to notify the affected merchants. **What mitigation solutions will we put in place to prevent this issue from occurring in the future?** Our team is looking into the option of implementing a cross-region replication or possible backup cloud service as a fall back so our services stay online in the event an outage like this occurs again.