Iron.io experienced a major incident on May 14, 2019 affecting IronWorker Dedicated and IronWorker Public, lasting 2h 39m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified May 14, 2019, 09:10 PM UTC
Due to a database upgrade issue, a portion of our IronWorker customers are experiencing issues with certain API commands. We've identified the issue and are in the process of resolving.
- identified May 14, 2019, 09:12 PM UTC
We are continuing to work on a fix for this issue.
- identified May 14, 2019, 10:20 PM UTC
Migration is still in progress. This is taking more time than expected but we're monitoring it closely.
- resolved May 14, 2019, 11:50 PM UTC
The migration has completed and service has returned to normal.
- postmortem May 14, 2019, 11:50 PM UTC
**Overview** On May 13th, at 03:29 UTC, we began routine database upgrades. During the upgrade process we noticed errors in our logs indicating certain queries weren’t able to complete successfully. **What went wrong** After investigating into the errors thrown, we found data anomalies in our Production data set that didn’t exist in our Staging data set. This difference resulted in slow queries and errors that cascaded into service interruptions for a subset of our customers. **What we're doing to prevent this from happening again** Moving forward we’re taking steps to ensure our Staging data set is 100% up to date with our Production data set. If the copies of the data were exact, this would have been caught in Staging and wouldn’t have caused a disruption in service. **Resolution time** The incident was resolved at 11:49 UTC