TechnologyOne incident
Performance Degradation - ANZ Region / 2023B (562) Fastline
TechnologyOne experienced a minor incident on April 28, 2024, lasting 2h 15m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Apr 28, 2024, 10:52 PM UTC
An issue has been identified and our engineers are currently investigating the issue. We shall aim to provide you with an update in the next 60 minutes.
- identified Apr 28, 2024, 11:01 PM UTC
Our team of engineers have identified the cause of the issue and are actively applying a fix to restore services to standard operating levels. We shall provide the next update once a fix is in place. Thank you for your patience while we work towards resolving this.
- monitoring Apr 28, 2024, 11:11 PM UTC
A fix has been implemented and we are currently monitoring the services for stability. Thank you.
- resolved Apr 29, 2024, 01:08 AM UTC
This incident is now resolved. A significant and sudden increase in CPU demand was triggered by an influx of data requests. This unusual pattern exceeded normal parameters, impacting our database operations and causing a delay in processing transactions. Our technical teams quickly identified and isolated the issue, implementing immediate measures to redistribute load and enhance processing capacity. We wish to thank you for your patience while we worked towards restoring the services. We shall now be resolving this incident.
- postmortem May 09, 2024, 02:14 AM UTC
**Critical Incident Summary** On the 29th of April 2024, at 8:37AM, TechnologyOne users on 2023B\(562\) Fastline started reporting degraded performance. Through monitoring, we received alerts of “High Priority”, first at 8:14AM, 8:21AM and 8:40AM where there were sessions timings out. Our investigation began at 8:30AM. **Root Cause** A burst of requests led to high CPU utilisation. This, in turn, increased the load on the SQL database, creating a cycle where requests continued to accumulate faster than they could be processed.’ TechnologyOne’s predictive autoscaling settings, which had been operating effectively based on previous metrics, were outpaced by the sudden increase in demand. **Preventive Measures** We have implemented the following actions, **Reprioritised Alerting Mechanisms**: We have upgraded the prioritisation of critical alerts to ensure faster response times. **Advanced Scaling Strategies**: Additional scaling rules aimed to enable our infrastructure to adapt more dynamically to changes in demand currently being tested prior to adoption.