Parade incident

Broker Portal Slowdown

Major Resolved View vendor source →

Parade experienced a major incident on July 13, 2022, lasting —. The incident has been resolved; the full update timeline is below.

Started
Jul 13, 2022, 05:53 PM UTC
Resolved
Jul 08, 2022, 11:30 AM UTC
Duration
Detected by Pingoru
Jul 13, 2022, 05:53 PM UTC

Update timeline

  1. resolved Jul 13, 2022, 05:53 PM UTC

    Issue Summary We encountered a major system slowdown on July 2, 2019 which affected primarily the broker web portal. Timeline Initial slowdowns related to the incident started on 4:32 AM PDT and our support team escalated the issue at 6:50 AM PDT. This issue was eventually fully resolved at 2:03PM PDT. Root Cause The cause of this incident was an issue related to our primary database, which most of our microservices connect to. We encountered a surge of database queries related to a weekly reporting job. This job caused a substantial increase of the amount of expensive queries happening in our database. This caused the initial slowdown, that our team was alerted to. As a result of the spike in traffic, the database went into recovery mode at 7:34 AM PDT, which resulted in a more severe downtime, as the database had to do a full restart. Resolution and recovery All systems were brought back online at 2:03PM PDT. Corrective and Preventative Measures We have added additional alerting around top-level database metrics. We are also reworking how our support and on-call engineering team are notified about deeper technical issues, to make sure we are able to respond to system level downtimes in a faster manner. The weekly reporting job in particular that had generated some of these expensive queries was identified as a legacy feature, and has been removed and has been deprecated as a feature. As a result of removing this job, our engineering team has also gone through all daily and weekly cronjobs, and we have cleaned up many expensive recurring jobs, as a preventative measure to make sure these background jobs to not cause database issues in the future