TechnologyOne incident
DP Service Degradation for customers across all releases
TechnologyOne experienced a minor incident on October 21, 2024 affecting Batch Services (DP Jobs), lasting 3h 19m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 21, 2024, 02:10 AM UTC
We are investigating an issue impacting DP Service. Impact/Error/How to verify: Customers on all Releases are experiencing sub-optimal performance for DP job processing. Due to the investigation, the next update will be provided in 60 minutes, or sooner if new information becomes available.
- monitoring Oct 21, 2024, 02:56 AM UTC
Our team has verified the implementation of a fix is complete for all customers and releases. We will monitor for the next 2 hours to ensure no further customers are impacted.
- resolved Oct 21, 2024, 05:29 AM UTC
This incident has been resolved.
- postmortem Nov 04, 2024, 05:05 AM UTC
**Issue Summary:** On Monday 21 October 2024 at 11.15am alert monitoring indicated that our cloud orchestration platform was spiking above it's normal response time. The TechnologyOne team began an investigation immediately. The impact was most noticeable with DP jobs queuing or unable to be submitted **Root Cause Analysis:** Queue and processing limits reached due to long-running processes locking the database. This caused the CPU utilisation to max out at 100%. **Corrective Measures:** Restarted all the tasks supporting the Cloud DP. Scaled out the microservice cluster to handle the load. **Preventive Measures:** Review and refine existing monitoring alerts. Longer term project to further enhance the scalability and performance underload for the DP microservice.