DRMtoday incident
DRMtoday Production: Increased latency in us-west-1
DRMtoday experienced a minor incident on October 16, 2018 affecting License Delivery, lasting 2h 1m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 16, 2018, 12:51 AM UTC
Due to health check failures in us-west-1, since 0:11 UTC all license deliveries in region us-west-1 are being routed to nearby regions which leads to increased latency for these requests. We don't see signs of failing license deliveries and are investigating the cause for the health check failures.
- identified Oct 16, 2018, 01:01 AM UTC
Update 02:41 UTC - This is an unrelated issue. The AWS service health dashboard states: 05:50 PM PDT We are investigating connectivity issues for some domains in a single Availability Zone in the US-WEST-1 Region.
- resolved Oct 16, 2018, 02:53 AM UTC
All systems are back to normal and licenses are now delivered from all DRMtoday regions. Timeline 00:06 - Backend nodes in us-west-1 lose connectivity to a backend database 00:06 - Health checks fail and all traffic to region us-west-1 is redirected to nearby regions 00:12 - DRMtoday's ops team is notified 00:21 - The offending database node is automatically shut down due to an earlier error. Unfortunately the usual failover/recovery fails. 00:55 - Database fully recovered 02:30 - DRMtoday's ops team reenabled deliveries from us-west-1 All times UTC We apologize for the inconvenience and will continue our investigation into the failover behavior.