Quantum Workplace incident

All applications offline due to database outage.

Critical Resolved View vendor source →

Quantum Workplace experienced a critical incident on July 6, 2022, lasting —. The incident has been resolved; the full update timeline is below.

Started
Jul 06, 2022, 06:00 PM UTC
Resolved
Jul 06, 2022, 06:00 PM UTC
Duration
Detected by Pingoru
Jul 06, 2022, 06:00 PM UTC

Update timeline

  1. resolved Jul 06, 2022, 08:12 PM UTC

    12:54:07 (CDT) - The Azure host that runs our primary database server (PRD-EUS2-SQL01A) had an outage. The physical node where the VM was running encountered a hardware issue due to a faulty disk which impacted the VMs on the node. 12:55:00 (CDT) - The server was migrated to a new host 13:05:45 (CDT) - The server started back up on the new host 13:08:30 (CDT) - Traffic was able to be back online

  2. postmortem Jul 06, 2022, 08:12 PM UTC

    Take aways: * Azure did successfully self-heal the outage as designed * We should have a nicer looking site down page instead of "502 gateway error" * We will re-evaluate enabling an auto-failover to our secondary database server \(trade-off is you can fail over more quickly, but have a chance at loss of data\)