SchemeServe experienced a minor incident on November 2, 2023 affecting 🎩 SchemeServe, lasting 2h 29m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 02, 2023, 10:18 AM UTC
We are currently investigating an issue with placing new quotes in SchemeServe. We will update as soon as we have more information.
- investigating Nov 02, 2023, 10:44 AM UTC
We have identified the issue as a massively increased load on one of our databases and are currently working to reduce this.
- monitoring Nov 02, 2023, 10:55 AM UTC
We have identified the source of the load on the database and have applied mitigation to reduce the load. This has had an immediate effect reducing load by 60%. We will continue to monitor the changes made.
- resolved Nov 02, 2023, 12:48 PM UTC
The issue is now fully resolved. A post-mortem will be posted in due course.
- postmortem Nov 02, 2023, 04:01 PM UTC
SchemeServe runs several performance monitoring scripts and services on our Primary Database to aggregate metrics to help identify performance issues, bad query plans and long-running queries. One of these scripts crossed a tipping point where the resources required to run the script jumped significantly causing a huge load on the database. This in turn caused other queries to run slowly. As some queries within SchemeServe will lock tables with the increased query time, this caused some additional write queries to this table to timeout. The tracking script is only one of many performance points that we use to monitor the database and so has been removed to stop further performance issues. SchemeServe aims to maintain an average overhead of at least 50% processing power on our databases during peak operational hours to allow for surges in demand. issue start time 09:14 issue first identified 09:21 issue resolved 10:38