ServiceChannel incident
ServiceChannel System Performance Degradation
ServiceChannel experienced a major incident on August 31, 2023 affecting Work Order Manager and Maps, lasting 38m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 31, 2023, 06:36 PM UTC
We are actively investigating degraded system performance. An update will be provided shortly. Thank you for your patience.
- resolved Aug 31, 2023, 07:14 PM UTC
This incident has been resolved. All services are working as expected.
- postmortem Sep 14, 2023, 06:13 PM UTC
**Infrastructure/hardware instability** **Incident Report** **Date of Incident:**` `08/31/2023 **Time/Date Incident Started:** 08/31/2023, 02:15 pm EDT **Time/Date Stability Restored:**` `08/31/2023, 02:47 pm EDT **Time/Date Incident Resolved:**` `08/31/2023, 02:50 pm EDT **Users Impacted:** All **Frequency:** Intermittent **Impact:** Major **Incident description** On August 31st at 02:15 pm EDT, the ServiceChannel Site Reliability Engineering \(SRE\) team received a large number of SQL timeout errors, followed by reports of dashboard slowness. **Root Cause Analysis** The Database Administration \(DBA\) team discovered a growing queue of active database queries and increasing resource waits, resulting from functionality that was causing database blocks and high CPU load on the database cluster. **Actions Taken** 1. Investigated system-generated alerts and identified affected platform functionality. 1. Recompiled the affected stored procedures and dropped all blocking connections to return the database cluster to the steady state. 1. Compiled incident findings for future remediation by the Application Engineering and SRE teams. **Mitigation Measures** 1. Coordinate with the Application Engineering team to identify and remediate the root causes of the high database CPU and blocks. 1. Identify and implement general performance improvements for database queries to increase overall platform stability. 1. Implement infrastructural modifications to distribute database I/O across additional read replicas.