DRMtoday incident

DRMtoday Production: service interruption in ap-southeast-2, ap-northeast-1 and us-west-1

Notice Resolved View vendor source →

DRMtoday experienced a notice incident on May 14, 2018, lasting 2h 40m. The incident has been resolved; the full update timeline is below.

Started
May 14, 2018, 02:04 AM UTC
Resolved
May 14, 2018, 04:44 AM UTC
Duration
2h 40m
Detected by Pingoru
May 14, 2018, 02:04 AM UTC

Update timeline

  1. investigating May 14, 2018, 02:04 AM UTC

    License delivery in the following regions was interrupted: * ap-southeast-2: 01:35 - 01:45 * ap-northeast-1: 01:39 - 01:48 * us-west-1: 01:42 - 01:54 Traffic was temporarily rerouted to other regions. We're investigating this issue.

  2. resolved May 14, 2018, 04:44 AM UTC

    The system automatically handled the failed regions by rerouting traffic. We apologize for any failed license deliveries and increased latency. We are continuing to research the underlying cause and will update this incident page with our findings.

  3. postmortem Aug 01, 2018, 07:36 PM UTC

    Root cause analysis for the outage on May 14th at 1:35 UTC With the last production deployment on May 8th, we introduced a bug that could be triggered by a specific type of invalid license request. This bug caused the application to fail ungracefully, which in turn led to region failover. We located the bug, fixed it, and added regression test cases to our build process. A hotfix for DRMtoday's production system was already deployed. We apologize for the inconvenience caused by this outage.