Pay & Connect experienced a critical incident on August 25, 2022, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Aug 26, 2022, 04:05 PM UTC
An increase of DB CPU usage was detected, which caused an increase in latency throughout the system.
- postmortem Aug 26, 2022, 04:05 PM UTC
To relieve the strain on the DB server while the source of the issue was investigated, the DB was allocated additional resources in the form of an increase in CPU. The added resources were sufficient to bring the CPU usage back down to acceptable levels temporarily, but this morning \(26/8\) those resources were stretched again to their limits and the server started presenting first increased latency, and eventually stopped serving requests. At this point we increased the DB CPU resources again even further to immediately relieve the load on the server, and increased efforts to establish the root cause. We isolated a particularly slow and long running query which had started showing performance degradation as a result of the transaction table size. We managed to implement a dramatic optimisation of the query and deployed an update soon after. Following this query optimisation we are able to see enormous improvements on the server performance metrics.