DRMtoday incident
DRMtoday Production: service interruption in ap-southeast-2, ap-northeast-1 and us-west-1
DRMtoday experienced a notice incident on May 14, 2018, lasting 2h 40m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating May 14, 2018, 02:04 AM UTC
License delivery in the following regions was interrupted: * ap-southeast-2: 01:35 - 01:45 * ap-northeast-1: 01:39 - 01:48 * us-west-1: 01:42 - 01:54 Traffic was temporarily rerouted to other regions. We're investigating this issue.
- resolved May 14, 2018, 04:44 AM UTC
The system automatically handled the failed regions by rerouting traffic. We apologize for any failed license deliveries and increased latency. We are continuing to research the underlying cause and will update this incident page with our findings.
- postmortem Aug 01, 2018, 07:36 PM UTC
Root cause analysis for the outage on May 14th at 1:35 UTC With the last production deployment on May 8th, we introduced a bug that could be triggered by a specific type of invalid license request. This bug caused the application to fail ungracefully, which in turn led to region failover. We located the bug, fixed it, and added regression test cases to our build process. A hotfix for DRMtoday's production system was already deployed. We apologize for the inconvenience caused by this outage.