Iron.io incident

IronWorker Degraded Performance

Major Resolved View vendor source →

Iron.io experienced a major incident on May 14, 2019 affecting IronWorker Dedicated and IronWorker Public, lasting 2h 39m. The incident has been resolved; the full update timeline is below.

Started
May 14, 2019, 09:10 PM UTC
Resolved
May 14, 2019, 11:50 PM UTC
Duration
2h 39m
Detected by Pingoru
May 14, 2019, 09:10 PM UTC

Affected components

IronWorker DedicatedIronWorker Public

Update timeline

  1. identified May 14, 2019, 09:10 PM UTC

    Due to a database upgrade issue, a portion of our IronWorker customers are experiencing issues with certain API commands. We've identified the issue and are in the process of resolving.

  2. identified May 14, 2019, 09:12 PM UTC

    We are continuing to work on a fix for this issue.

  3. identified May 14, 2019, 10:20 PM UTC

    Migration is still in progress. This is taking more time than expected but we're monitoring it closely.

  4. resolved May 14, 2019, 11:50 PM UTC

    The migration has completed and service has returned to normal.

  5. postmortem May 14, 2019, 11:50 PM UTC

    **Overview** On May 13th, at 03:29 UTC, we began routine database upgrades. During the upgrade process we noticed errors in our logs indicating certain queries weren’t able to complete successfully. **What went wrong** After investigating into the errors thrown, we found data anomalies in our Production data set that didn’t exist in our Staging data set. This difference resulted in slow queries and errors that cascaded into service interruptions for a subset of our customers. **What we're doing to prevent this from happening again** Moving forward we’re taking steps to ensure our Staging data set is 100% up to date with our Production data set. If the copies of the data were exact, this would have been caught in Staging and wouldn’t have caused a disruption in service. **Resolution time** The incident was resolved at 11:49 UTC