TechnologyOne incident
DP Service Disruption for ALL customers / UK Region / ALL Releases
TechnologyOne experienced a major incident on February 27, 2025 affecting Batch Services (DP Jobs), lasting 5h 50m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Feb 27, 2025, 06:20 AM UTC
We are investigating an issue impacting the DP service for UK Region. This will present as: - Users see DP jobs as "submitted" - Users in a worksheet waiting on a DP job to be completed will see that part of the process is continuing to spin. We are applying mitigations and next update will be provided in 60 minutes, or sooner if new information becomes available.
- investigating Feb 27, 2025, 07:17 AM UTC
Our investigation continues for the UK DP services however the logs are showing promising progress. We will continue to apply mitigations, and next update will be provided in 60 minutes, or sooner if new information becomes available.
- investigating Feb 27, 2025, 08:17 AM UTC
Our investigation into the UK DP services is ongoing. We are currently examining logs for any errors and reviewing reports from customers. Next update will be provided in 60 minutes, or sooner if new information becomes available.
- investigating Feb 27, 2025, 09:11 AM UTC
Our logs show a large increase in new DP jobs progressing. We continue to apply mitigations, and the next update will be provided within 60 minutes or sooner.
- monitoring Feb 27, 2025, 09:35 AM UTC
Our team has verified the implementation of a fix is complete. We will monitor the logs for the next 2 hours to ensure no further impacts.
- resolved Feb 27, 2025, 12:10 PM UTC
After 2 hours monitoring, this incident is now resolved. We will perform a post incident review to identify underlying cause, and preventive action to avoid a repeat in the future, and post here on completion. We apologise for how you and your business may have been affected by this incident.
- postmortem Mar 04, 2025, 08:11 AM UTC
**Issue Summary:** On Thursday 27 February at 12.00am GMT alert monitoring indicated that our cloud orchestration platform was spiking above its normal response time. The TechnologyOne team began an investigation immediately. The impact was seen by users on DP jobs queuing or unable to be submitted. Users also experienced longer run times on worksheet processes due to the DP jobs taking longer to be picked up and processed. **Root Cause Analysis:** Queue and processing limits reached due to long-running processes locking the cloud orchestration database. This caused the CPU utilisation to max out at 100%. Whilst the cloud orchestration database was recovered within 45 mins the Cloud DP Service did not recover due to the backlog of DP jobs. The TechnologyOne team undertook several actions to clear the backlog from the Cloud DP Service and this was stabilised at 9.33am GMT. **Corrective Measures:** Restarted all the tasks supporting the Cloud DP. Built additional microservice clusters and scaled out the microservice cluster to handle the load. Recycled servers in the microservice cluster. Scaled back the number of DP Servers \(due to auto scaling\) to reduce the load. **Preventive Measures:** A full review of the Cloud DP service in conjunction with an upstream provider is underway with the expectation additional mitigations will be implemented. An ongoing project is being accelerated to further enhance the scalability and performance underload for the DP microservice and is planned for completion by August 2025.