UTHPC Outage History
UTHPC is up right nowUTHPC had 21 outages in the last 2 years totaling 36h 11m of downtime — averaging 0.9 incidents per month.
There were 21 UTHPC outages since June 28, 2025 totaling 36h 11m of downtime. Each is summarised below — incident details, duration, and resolution information.
Galaxy outage
Timeline · 2 updates
- investigating Jun 05, 2026, 08:59 AM UTC
Galaxy cannot be accessed at the moment. This incident was created by an automated monitoring service.
- resolved Jun 05, 2026, 09:17 AM UTC
Galaxy is now operational! This update was created by an automated monitoring service.
Rocket cluster file system instability
Timeline · 2 updates
- investigating Jun 02, 2026, 12:58 PM UTC
We have identified an issue with the rocket cluster file systems that causes transient errors on nodes. Less nodes may be available while we work on fixing the underlying issue.
- resolved Jun 03, 2026, 03:00 PM UTC
This underlying issues with the filesystem have been identified and resolved. All systems back operational.
Issues with MyAccessID UT identity verification
Timeline · 3 updates
- investigating Jun 02, 2026, 07:51 AM UTC
The University of Tartu authentication system is currently experiencing technical issues. As a result, login using **MyAccessID UT identity verification** is temporarily unavailable. The University of Tartu Information Technology Office (ITO) has been informed and is working on resolving the issue. We apologize for the inconvenience and recommend trying again later.
- investigating Jun 02, 2026, 08:24 AM UTC
This issue is also impacting the MyAccessID authentication process for Kubernetes.
- resolved Jun 02, 2026, 09:50 AM UTC
This incident has been resolved.
docs.hpc.ut.ee outage
Timeline · 2 updates
- investigating May 15, 2026, 02:01 PM UTC
docs.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
- resolved May 15, 2026, 02:13 PM UTC
docs.hpc.ut.ee is now operational! This update was created by an automated monitoring service.
hpc.ut.ee outage
Timeline · 2 updates
- investigating May 15, 2026, 02:01 PM UTC
hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
- resolved May 15, 2026, 02:15 PM UTC
hpc.ut.ee is now operational! This update was created by an automated monitoring service.
support.hpc.ut.ee outage
Timeline · 2 updates
- investigating May 15, 2026, 01:53 AM UTC
support.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
- resolved May 15, 2026, 02:07 AM UTC
support.hpc.ut.ee is now operational! This update was created by an automated monitoring service.
Possible temporary service interruptions due to Critical Linux Kernel Vulnerability (CVE-2026-31431)
Timeline · 4 updates
- identified Apr 30, 2026, 06:45 AM UTC
Due to the recently disclosed “[Copy Fail](https://copy.fail/)” (CVE-2026-31431) Linux kernel vulnerability, our system administrators are actively applying mitigation measures and updates across all Tartu University HPC Center systems today. As a result, you may experience temporary interruptions or reduced service availability while this work is in progress. All UT HPC services are affected. At this time, the work is expected to be completed today; however, we will provide further updates if the situation extends beyond today.continuing to work on a fix for this incident. **Users running their own Linux virtual machines are advised to apply the recommended patches on their systems by following the instructions provided in the “Mitigation” section of the** [**CVE-2026-31431 advisory**](https://copy.fail/)**.** Thank you for your understanding.
- identified Apr 30, 2026, 08:41 AM UTC
$18
- identified Apr 30, 2026, 09:43 AM UTC
We have created a documentation for CVE-2026-31431 mitigation: This is primarily useful for UT Cloud virtual machine managers. We'll keep updating the document with the best approaches as we learn more.
- resolved Apr 30, 2026, 01:48 PM UTC
We have applied mitigation measures for the Critical Linux Kernel Vulnerability (CVE-2026-31431) across all affected services. The incident is resolved. However, the virtual machine managers are still required to apply the patches following the guides published here: You are welcome to contact support for additional information: [[email protected]](mailto:[email protected]) Best, UT HPC Center
Inaccessible services
Timeline · 1 update
- resolved Feb 26, 2026, 08:28 AM UTC
Type: Incident Duration: 38 minutes Affected Components: RStudio, Open OnDemand, , kubernetes.hpc.ut.ee, puhuri-portal.neic.no, puhuri.metacenter.no, account.lumi.cscs.ch, my.lumi-supercomputer.eu, lumi.deic.dk, , Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → Services → Feb 26, 08:28:43 GMT+0 - Investigating - We are currently investigating this incident. Feb 26, 08:37:47 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Feb 26, 09:06:52 GMT+0 - Resolved - This incident has been resolved.
HPC Cluster Software Module Load Interrupted
Timeline · 1 update
- resolved Feb 05, 2026, 01:26 PM UTC
Type: Incident Duration: 3 hours and 5 minutes Affected Components: RStudio, Open OnDemand, rocket.hpc.ut.ee Feb 5, 16:31:08 GMT+0 - Resolved - The cluster software stack has been restored from a prior point-in-time copy. All operations should be normal Feb 5, 13:26:06 GMT+0 - Investigating - In HPC Cluster some software modules loading may give an error. We are fixing the issue. Thank you for your patience.
Network issues
Timeline · 1 update
- resolved Jan 26, 2026, 10:41 AM UTC
Type: Incident Duration: 39 minutes Affected Components: RStudio, kubernetes.hpc.ut.ee, puhuri.metacenter.no, minu.etais.ee, my.lumi-supercomputer.eu, Open OnDemand, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Jan 26, 10:41:32 GMT+0 - Identified - We have detected a network issue that has caused services to be unavailable. Jan 26, 11:20:58 GMT+0 - Resolved - The issue has been resolved, and all services are now available again.
Kubernetes network connections
Timeline · 1 update
- resolved Jan 14, 2026, 02:15 PM UTC
Type: Incident Duration: 1 hour and 31 minutes Affected Components: , kubernetes.hpc.ut.ee, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, , hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → UT HPC webservices → Jan 14, 14:15:00 GMT+0 - Investigating - We're currently seeing higher request error and request latency rates to applications in Kubernetes. Working on trying to find a cause. Jan 14, 15:21:43 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Jan 14, 15:46:27 GMT+0 - Resolved - The issue was related to the amount of connections to/from Kubernetes, our firewalls were configured to not allow that high connection rate, but as we are adding new nodes to the cluster, the base rate is exceeding the limits. The limits have been revisited, and new monitoring is being added to not have the same issue in the future.
minu.etais.ee self-service environment issues with loading login page
Timeline · 1 update
- resolved Jan 06, 2026, 09:00 AM UTC
Type: Incident Affected Components: minu.etais.ee, puhuri-portal.neic.no, lumi.deic.dk Jan 6, 09:00:00 GMT+0 - Resolved - [minu.etais.ee](http://minu.etais.ee) self-service environment encountered issues with loading the login page. The page loading attempt resulted in an error 429: Too Many Requests. The problem occurred intermittently today between 8:00 and 11:00 am. The issue is resolved now. Please get in touch with us if you detect any anomalies (email: [email protected]). Jan 6, 09:36:00 GMT+0 - Resolved - This incident has been resolved.
External network issues
Timeline · 1 update
- resolved Dec 18, 2025, 09:08 AM UTC
Type: Incident Duration: 3 hours and 57 minutes Affected Components: Open OnDemand, kubernetes.hpc.ut.ee, RStudio, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Dec 18, 13:04:30 GMT+0 - Resolved - We have identified the issue and work on resolving it to prevent it in the future. Due to a migration of networks between central University network and ours, a configuration ovelap caused the websites to be unavailable. Dec 18, 09:08:00 GMT+0 - Identified - We have detected an issue with external network, which caused services to be unavailable. Dec 18, 09:20:00 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result, trying to find the initial cause.
ondemand.hpc.ut.ee is currently unavailable, we are working on fix
Timeline · 1 update
- resolved Dec 10, 2025, 10:01 AM UTC
Type: Incident Duration: 33 minutes Affected Components: Open OnDemand, RStudio Dec 10, 10:01:39 GMT+0 - Investigating - [ondemand.hpc.ut.ee](http://ondemand.hpc.ut.ee) is currently unavailable. We are investigating this incident. Rocket cluster is still available via shell access. Dec 10, 10:05:10 GMT+0 - Investigating - The SFTP service is affected as well. Dec 10, 10:34:48 GMT+0 - Resolved - Ondemand services affected by crash of NFS subsystem. The issue was resolved, all services operational once again
registry.hpc.ut.ee outage
Timeline · 1 update
- resolved Dec 02, 2025, 04:53 AM UTC
Type: Incident Duration: 2 hours and 6 minutes Affected Components: registry.hpc.ut.ee Dec 2, 04:53:22 GMT+0 - Investigating - registry.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service. Dec 2, 06:59:23 GMT+0 - Resolved - registry.hpc.ut.ee is now operational! This update was created by an automated monitoring service.
courses.cs.ut.ee
Timeline · 1 update
- resolved Nov 14, 2025, 06:19 AM UTC
Type: Incident Duration: 24 minutes Nov 14, 06:19:28 GMT+0 - Investigating - Courses.cs.ut.ee is unavailable Nov 14, 06:43:17 GMT+0 - Resolved - This incident has been resolved. Issue was related to database access. This has now been resolved.
S3 service at object.hpc.ut.ee inaccessible
Timeline · 1 update
- resolved Nov 12, 2025, 09:20 AM UTC
Type: Incident Duration: 15 minutes Nov 12, 09:20:57 GMT+0 - Identified - The S3 service hosted at https://object.hpc.ut.ee is currently inaccessible. We have identified the issue and are working to resolve it. Nov 12, 09:36:15 GMT+0 - Resolved - An internal certificate had expired. This issue has now been resolved. We are implementing safeguards to prevent it from happening again.