Zonos experienced a major incident on November 8, 2024 affecting Landed Cost API and International Checkout, lasting 1h 25m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 08, 2024, 07:49 PM UTC
We are currently investigating this issue.
- identified Nov 08, 2024, 07:49 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Nov 08, 2024, 07:50 PM UTC
A fix has been implemented and we are monitoring the results.
- resolved Nov 08, 2024, 07:52 PM UTC
This incident has been resolved.
- postmortem Nov 08, 2024, 07:53 PM UTC
### What products were affected and what was the impact? Landed Cost API, Checkout Impact: CRITICAL ### What timeframe did this issue occur? | **Date** | **Time** | | --- | --- | | November 8, 2024 | 8:50am - 10:26am MST | ### How was the issue detected? A spike in error logs triggered an alert to our Engineering team, who responded immediately to the issue. ### What functionality was affected? Landed Cost quotes that use our automated item classification service failed. ### What problems did this cause? If an HS Code was not provided in the API request to Landed Cost, and the automatic classification service was enabled, then the landed cost quote would fail. When the landed cost quote fails, shoppers may not be able to place their order. ### What was the resolution of the problem and steps that are being taken for continued follow-up? The root cause of the issue was a deployment issue with the item service used for automatic classification. While the issue was detected immediately, resolution required rebuilding and redeploying services, which took longer than expected. After services were rebuilt and redeployed, the system health was validated and normal operations resumed. ### What mitigation solutions will we put in place to prevent this issue from occurring in the future? We discovered that this issue was due, in part, to a deficiency in our deployment procedures. We are working to update the procedure to prevent any future issues. We are also creating a synthetic test in our lower environments that will catch similar issues before they can be deployed into production.