Parade incident

Portal and Load Processing slowdown

Major Resolved View vendor source →

Parade experienced a major incident on August 23, 2022, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 23, 2022, 04:00 PM UTC
Resolved
Aug 23, 2022, 04:00 PM UTC
Duration
Detected by Pingoru
Aug 23, 2022, 04:00 PM UTC

Update timeline

  1. resolved Sep 01, 2022, 05:08 PM UTC

    Issue Summary We experienced a system-wide slowdown due to a long-running database operation that ended up locking up most of our production tables. Timeline The issue started on Aug 23 at 9:02am PST, and we reached full recovery at 10:18 am PST. Root Cause The cause of the issue originated from a long running database query that lasted for more than 24 hours. This query results from a data cleanup job that was done on behalf of a customer. This clean up job started running the day prior, and did not finish. The last step of the query caused a database lock on many of the key tables that our application uses in our production database. Resolution and recovery Terminating the database query at 10:08 am PST helped free up the database locks. This allowed for our system to immediately recover, the broker portal issues were resolved at the time of resolution, and the backlog of load updates that were affected by this downtime were synced up to real-time within the hour. Corrective and Preventative Measures We are forbidding our team from running the same data deletion query on our database in the future. We have also solutioned an alternate approach to getting data out of our production database, that no longer requires long running database queries. As a result of the 2 measure above, the database operation in question should never be executed again. We have also implemented a company policy to no longer run long running jobs that affect multiple tables overnight, as these are unpredictable in when they will possibly cause lockups.