UTHPC experienced a minor incident on July 15, 2025, lasting —. The incident has been resolved; the full update timeline is below.
Update timeline
- resolved Jul 15, 2025, 06:00 AM UTC
Type: Maintenance Duration: 2 hours and 42 minutes Affected Components: rocket.hpc.ut.ee Jul 15, 06:00:00 GMT+0 - Identified - HPC cluster Rocket updates are scheduled for July 2025 that will improve the cluster's performance and capabilities. **1) Login Node Updates** We'll be performing system updates on both login nodes this month: * **Login1**: July 15th * **Login2**: July 22nd To minimize disruption, we'll close new SSH connections one week before each update, allowing existing connections to naturally expire. One of the login nodes will remain available at all times, so you won't experience any service downtime. **2) Slurm Update** On **July 22nd, starting at 15:00**, we'll be upgrading Slurm from version 23.02 to 23.11\. Your running jobs won't be affected, and you'll be able to submit new jobs during the update. However, commands like sacct, sacctmgr, and related tools will be unavailable during the update. The process should take about two hours but may run longer. We After July 22nd, the compute nodes will be updated in a rolling fashion. This means some nodes will be temporarily drained until all updates are complete, which may result in longer queue times depending on cluster usage. Jul 22, 06:00:00 GMT+0 - Identified - Maintenance is now in progress. Jul 16, 13:11:18 GMT+0 - Identified - We will be directing SSH to login1 today. The login2 internal route will still stay open until the 22nd, when we will be performing maintenance and rebooting the machine. Jul 15, 08:41:38 GMT+0 - Identified - Maintenance is now in progress. Jul 15, 06:00:00 GMT+0 - Completed - Maintenance has completed successfully