UTHPC Outage History

UTHPC had 25 outages in the last 2 years totaling 167h 0m of downtime — averaging 1 incident per month.

There were 25 UTHPC outages since August 22, 2025 totaling 167h 0m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.hpc.ut.ee

Major July 29, 2026

hpc.ut.ee outage

Detected by Pingoru: Jul 29, 2026, 07:56 AM UTC
Resolved: Jul 29, 2026, 08:10 AM UTC
Duration: 14m

Affected: UT HPC webservices (hpc.ut.ee)

Timeline · 2 updates

investigating Jul 29, 2026, 07:56 AM UTC

hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
resolved Jul 29, 2026, 08:10 AM UTC

hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major July 29, 2026

docs.hpc.ut.ee outage

Detected by Pingoru: Jul 29, 2026, 07:55 AM UTC
Resolved: Jul 29, 2026, 08:11 AM UTC
Duration: 16m

Affected: UT HPC webservices (docs.hpc.ut.ee)

Timeline · 2 updates

investigating Jul 29, 2026, 07:55 AM UTC

docs.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
resolved Jul 29, 2026, 08:11 AM UTC

docs.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major July 23, 2026

Newly Disclosed Linux kernel Security Incident – Temporary Service Interruptions

Detected by Pingoru: Jul 23, 2026, 07:24 AM UTC
Resolved: Jul 24, 2026, 06:54 AM UTC
Duration: 23h 30m

Affected: rocket.hpc.ut.eeUT HPC webservices (hpc.ut.ee)UT HPC webservices (docs.hpc.ut.ee)Services (Galaxy)UT HPC webservices (support.hpc.ut.ee)Services (RStudio)Services (Open OnDemand)UT HPC webservices (registry.hpc.ut.ee)

Timeline · 4 updates

investigating Jul 23, 2026, 07:24 AM UTC

We are currently responding to a newly disclosed Linux kernel security vulnerability affecting the XFS filesystem. As a precaution, we are implementing mitigation measures across affected infrastructure. **Affected services:** * Web Servers * Rocket Cluster, along with OpenOndemand * SAPU * Cloud * Galaxy To reduce risk while mitigation work is underway, **Cluster access has been temporarily disabled**. You may experience temporary service interruptions across the affected services during this maintenance. Our teams are actively applying mitigations and validating service integrity. We will provide further updates as work progresses and restore normal access as soon as it is safe to do so. Thank you for your patience and understanding.
identified Jul 23, 2026, 07:24 AM UTC

We are continuing to work on a fix for this incident.
identified Jul 23, 2026, 11:13 AM UTC

We have made progress in addressing the recently disclosed Linux kernel security vulnerability. The following services have now been restored and are available: * Cloud Service * Web Server (Website Hosting Service) Mitigation work is still ongoing for the following services: * Rocket Cluster and related services such as Galaxy, Open OnDemand * SAPU These services will remain unavailable while we complete the required patching and validation to ensure they can be restored safely. We will continue to provide updates as additional services become available.
resolved Jul 24, 2026, 06:54 AM UTC

All services previously affected by the Linux kernel security vulnerability impacting the XFS filesystem were restored by midnight on **23 July**. The current service status is as follows: * All **SAPU** systems have been restored and are fully available. * Web servers are up. * Galaxy is available. * OpenOndemand and its services, RStudio and Jupyter, are available. * The **Rocket Cluster** is available for compute workloads. * A small number of older **Rocket Cluster V100 GPU nodes** remain temporarily isolated while they undergo the final upgrade. These nodes are expected to be returned to the queue later today. Thank you for your patience while we implemented the necessary mitigations to ensure the security and stability of our services.

Read the full incident report →

Major July 14, 2026

UT HPC Web Server Under Emergency Maintenance

Detected by Pingoru: Jul 14, 2026, 08:30 AM UTC
Resolved: Jul 14, 2026, 02:43 PM UTC
Duration: 6h 12m

Timeline · 2 updates

investigating Jul 14, 2026, 08:30 AM UTC

Due to emergency maintenance, the UT HPC web server is currently unavailable. As a result, **all** UT HPC-hosted websites are temporarily inaccessible. We apologize for the inconvenience and are working to restore service as quickly as possible.
resolved Jul 14, 2026, 02:43 PM UTC

The UT HPC Center's web server is back operational, and all web pages hosted by UT HPC Center are available.

Read the full incident report →

Major July 11, 2026

University of Tartu cloud service emergency maintenance - mitigating a Linux kernel CVE

Detected by Pingoru: Jul 11, 2026, 06:00 AM UTC
Resolved: Jul 11, 2026, 09:05 AM UTC
Duration: 3h 5m

Timeline · 2 updates

investigating Jul 11, 2026, 06:00 AM UTC

Due to a recently disclosed critical security vulnerability affecting Linux kernel-based virtualization platforms, we are implementing preventive security measures. As part of these maintenance activities, UT Cloud virtual machines (VMs) will be restarted. The VMs will be brought back online in a rolling manner over the next couple of hours. Besides virtual machines, some websites are also affected. In case of questions, please contact us at [email protected]. We apologize for any inconvenience and appreciate your understanding as we work to ensure the security and stability of our platform.
resolved Jul 11, 2026, 09:05 AM UTC

The maintenance is finished. The UT Cloud service is available, and affected virtual machines and websites are restored.

Read the full incident report →

Major July 10, 2026

Galaxy outage

Detected by Pingoru: Jul 10, 2026, 12:17 PM UTC
Resolved: Jul 10, 2026, 12:19 PM UTC
Duration: 1m

Affected: Services (Galaxy)

Timeline · 2 updates

investigating Jul 10, 2026, 12:17 PM UTC

Galaxy cannot be accessed at the moment. This incident was created by an automated monitoring service.
resolved Jul 10, 2026, 12:19 PM UTC

Galaxy is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major July 9, 2026

HPC cluster Rocket is temporarily unavailable due to ongoing security fixes (CVE-2026-46242 and CVE-2026-43499)

Detected by Pingoru: Jul 09, 2026, 06:37 PM UTC
Resolved: Jul 10, 2026, 01:02 PM UTC
Duration: 18h 24m

Affected: rocket.hpc.ut.eeServices (Galaxy)Services (RStudio)Services (Open OnDemand)

Timeline · 5 updates

investigating Jul 09, 2026, 06:37 PM UTC

**Due to recently identified Linux kernel vulnerabilities (**[**CVE-2026-46242**](https://access.redhat.com/security/cve/cve-2026-46242) **and** [**CVE-2026-43499**](https://access.redhat.com/security/cve/cve-2026-43499)**), we have temporarily disabled access to the UT HPC Rocket cluster as a precautionary measure to mitigate potential security risks.** Currently, access to login nodes is unavailable. Also, new job submissions are temporarily disabled. Jobs that were already running before the access restrictions were applied continue to run and are not affected. Our system administrators are working to restore full service as quickly as possible. Thank you for your patience!
monitoring Jul 10, 2026, 06:14 AM UTC

We have applied the security patches to mitigate the vulnerability. The login nodes are now accessible; however, job submission remains disabled while we complete additional validation and testing. We require more time to ensure the cluster is fully stable before restoring normal operation. We appreciate your patience.
identified Jul 10, 2026, 07:04 AM UTC

Due to the same security vulnerability, all SAPU machines are currently unavailable while security patches are being applied. We expect service to be restored later tonight. Thank you for your patience.
identified Jul 10, 2026, 08:42 AM UTC

We are closing login nodes access again. The Rocket cluster is continuously not available. We expect the cluster to remain unavailable for most of the day. We apologize for the inconvenience and will provide updates as they become available.
resolved Jul 10, 2026, 01:02 PM UTC

**HPC services are restored.** Access to the UT Rocket cluster and Galaxy, Open OnDemand, and their services RStudio and Jupyter have been fully restored. The previously identified Linux kernel vulnerabilities have been addressed, and users can once again access the Rocket login nodes as usual and submit new jobs to the compute nodes.

Read the full incident report →

Major July 9, 2026

Mainenance on all SAPU machines due to Linux kernel security vulnerabilities - service inaccessible

Detected by Pingoru: Jul 09, 2026, 12:00 PM UTC
Resolved: Jul 12, 2026, 05:52 PM UTC
Duration: 3d 5h

Timeline · 2 updates

identified Jul 09, 2026, 12:00 PM UTC

Due to the Linux kernel security vulnerabilities that also affected the HPC Rocket cluster (incident on 9 July 2026), all SAPU virtual machines are temporarily unavailable while security updates are being applied. We will provide updates as progress is made. Thank you for understanding.
resolved Jul 12, 2026, 05:52 PM UTC

All SAPU workstations are now patched and available. We apologize for the inconvenience and appreciate your understanding. If you encounter any issues with your workstation, be sure to contact [email protected]

Read the full incident report →

Major June 27, 2026

support.hpc.ut.ee outage

Detected by Pingoru: Jun 27, 2026, 06:43 PM UTC
Resolved: Jun 27, 2026, 06:47 PM UTC
Duration: 3m

Affected: UT HPC webservices (support.hpc.ut.ee)

Timeline · 2 updates

investigating Jun 27, 2026, 06:43 PM UTC

support.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.
resolved Jun 27, 2026, 06:47 PM UTC

support.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Minor June 25, 2026

Helpdesk Closed from 22 June – Back on 25 June

Detected by Pingoru: Jun 25, 2026, 05:28 AM UTC
Resolved: Jun 25, 2026, 07:34 AM UTC
Duration: 2h 6m

Read the full incident report →

Minor June 8, 2026

Security Notice: Vulnerability Mitigation

Detected by Pingoru: Jun 08, 2026, 12:19 PM UTC
Resolved: Jun 08, 2026, 12:29 PM UTC
Duration: 9m

Read the full incident report →

Minor June 2, 2026

Rocket cluster file system instability

Detected by Pingoru: Jun 02, 2026, 12:58 PM UTC
Resolved: Jun 03, 2026, 03:00 PM UTC
Duration: 1d 2h

Affected: rocket.hpc.ut.eeServices (Galaxy)Services (Open OnDemand)

Timeline · 2 updates

investigating Jun 02, 2026, 12:58 PM UTC

We have identified an issue with the rocket cluster file systems that causes transient errors on nodes. Less nodes may be available while we work on fixing the underlying issue.
resolved Jun 03, 2026, 03:00 PM UTC

This underlying issues with the filesystem have been identified and resolved. All systems back operational.

Read the full incident report →

Major June 2, 2026

Issues with MyAccessID UT identity verification

Detected by Pingoru: Jun 02, 2026, 07:51 AM UTC
Resolved: Jun 02, 2026, 09:50 AM UTC
Duration: 1h 58m

Affected: Waldur portals (puhuri-portal.neic.no)kubernetes.hpc.ut.eeWaldur portals (puhuri.metacenter.no)minu.etais.eeWaldur portals (my.lumi-supercomputer.eu)Waldur portals (lumi.deic.dk)Services (Open OnDemand)Waldur portals (account.lumi.cscs.ch)

Timeline · 3 updates

investigating Jun 02, 2026, 07:51 AM UTC

The University of Tartu authentication system is currently experiencing technical issues. As a result, login using **MyAccessID UT identity verification** is temporarily unavailable. The University of Tartu Information Technology Office (ITO) has been informed and is working on resolving the issue. We apologize for the inconvenience and recommend trying again later.
investigating Jun 02, 2026, 08:24 AM UTC

This issue is also impacting the MyAccessID authentication process for Kubernetes.
resolved Jun 02, 2026, 09:50 AM UTC

This incident has been resolved.

Read the full incident report →

Major April 30, 2026

Possible temporary service interruptions due to Critical Linux Kernel Vulnerability (CVE-2026-31431)

Detected by Pingoru: Apr 30, 2026, 06:45 AM UTC
Resolved: Apr 30, 2026, 01:48 PM UTC
Duration: 7h 3m

Affected: rocket.hpc.ut.eeUT HPC webservices (hpc.ut.ee)Waldur portals (puhuri-portal.neic.no)UT HPC webservices (docs.hpc.ut.ee)kubernetes.hpc.ut.eeWaldur portals (puhuri.metacenter.no)Services (Galaxy)minu.etais.eeUT HPC webservices (support.hpc.ut.ee)Services (RStudio)Waldur portals (my.lumi-supercomputer.eu)Waldur portals (lumi.deic.dk)Services (Open OnDemand)UT HPC webservices (registry.hpc.ut.ee)Waldur portals (account.lumi.cscs.ch)

Timeline · 4 updates

identified Apr 30, 2026, 06:45 AM UTC

Due to the recently disclosed “[Copy Fail](https://copy.fail/)” (CVE-2026-31431) Linux kernel vulnerability, our system administrators are actively applying mitigation measures and updates across all Tartu University HPC Center systems today. As a result, you may experience temporary interruptions or reduced service availability while this work is in progress. All UT HPC services are affected. At this time, the work is expected to be completed today; however, we will provide further updates if the situation extends beyond today.continuing to work on a fix for this incident. **Users running their own Linux virtual machines are advised to apply the recommended patches on their systems by following the instructions provided in the “Mitigation” section of the** [**CVE-2026-31431 advisory**](https://copy.fail/)**.** Thank you for your understanding.
identified Apr 30, 2026, 08:41 AM UTC

$18
identified Apr 30, 2026, 09:43 AM UTC

We have created a documentation for CVE-2026-31431 mitigation: This is primarily useful for UT Cloud virtual machine managers. We'll keep updating the document with the best approaches as we learn more.
resolved Apr 30, 2026, 01:48 PM UTC

We have applied mitigation measures for the Critical Linux Kernel Vulnerability (CVE-2026-31431) across all affected services. The incident is resolved. However, the virtual machine managers are still required to apply the patches following the guides published here: You are welcome to contact support for additional information: [[email protected]](mailto:[email protected]) Best, UT HPC Center

Read the full incident report →

Minor February 26, 2026

Inaccessible services

Detected by Pingoru: Feb 26, 2026, 08:28 AM UTC
Resolved: Feb 26, 2026, 08:28 AM UTC
Duration: —

Timeline · 1 update

resolved Feb 26, 2026, 08:28 AM UTC

Type: Incident Duration: 38 minutes Affected Components: RStudio, Open OnDemand, , kubernetes.hpc.ut.ee, puhuri-portal.neic.no, puhuri.metacenter.no, account.lumi.cscs.ch, my.lumi-supercomputer.eu, lumi.deic.dk, , Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → Services → Feb 26, 08:28:43 GMT+0 - Investigating - We are currently investigating this incident. Feb 26, 08:37:47 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Feb 26, 09:06:52 GMT+0 - Resolved - This incident has been resolved.

Read the full incident report →

Minor February 5, 2026

HPC Cluster Software Module Load Interrupted

Detected by Pingoru: Feb 05, 2026, 01:26 PM UTC
Resolved: Feb 05, 2026, 01:26 PM UTC
Duration: —

Timeline · 1 update

resolved Feb 05, 2026, 01:26 PM UTC

Type: Incident Duration: 3 hours and 5 minutes Affected Components: RStudio, Open OnDemand, rocket.hpc.ut.ee Feb 5, 16:31:08 GMT+0 - Resolved - The cluster software stack has been restored from a prior point-in-time copy. All operations should be normal Feb 5, 13:26:06 GMT+0 - Investigating - In HPC Cluster some software modules loading may give an error. We are fixing the issue. Thank you for your patience.

Read the full incident report →

Minor January 26, 2026

Network issues

Detected by Pingoru: Jan 26, 2026, 10:41 AM UTC
Resolved: Jan 26, 2026, 10:41 AM UTC
Duration: —

Timeline · 1 update

resolved Jan 26, 2026, 10:41 AM UTC

Type: Incident Duration: 39 minutes Affected Components: RStudio, kubernetes.hpc.ut.ee, puhuri.metacenter.no, minu.etais.ee, my.lumi-supercomputer.eu, Open OnDemand, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Jan 26, 10:41:32 GMT+0 - Identified - We have detected a network issue that has caused services to be unavailable. Jan 26, 11:20:58 GMT+0 - Resolved - The issue has been resolved, and all services are now available again.

Read the full incident report →

Minor January 14, 2026

Kubernetes network connections

Detected by Pingoru: Jan 14, 2026, 02:15 PM UTC
Resolved: Jan 14, 2026, 02:15 PM UTC
Duration: —

Timeline · 1 update

resolved Jan 14, 2026, 02:15 PM UTC

Type: Incident Duration: 1 hour and 31 minutes Affected Components: , kubernetes.hpc.ut.ee, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, , hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → UT HPC webservices → Jan 14, 14:15:00 GMT+0 - Investigating - We're currently seeing higher request error and request latency rates to applications in Kubernetes. Working on trying to find a cause. Jan 14, 15:21:43 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Jan 14, 15:46:27 GMT+0 - Resolved - The issue was related to the amount of connections to/from Kubernetes, our firewalls were configured to not allow that high connection rate, but as we are adding new nodes to the cluster, the base rate is exceeding the limits. The limits have been revisited, and new monitoring is being added to not have the same issue in the future.

Read the full incident report →

Minor January 6, 2026

minu.etais.ee self-service environment issues with loading login page

Detected by Pingoru: Jan 06, 2026, 09:00 AM UTC
Resolved: Jan 06, 2026, 09:00 AM UTC
Duration: —

Timeline · 1 update

resolved Jan 06, 2026, 09:00 AM UTC

Type: Incident Affected Components: minu.etais.ee, puhuri-portal.neic.no, lumi.deic.dk Jan 6, 09:00:00 GMT+0 - Resolved - [minu.etais.ee](http://minu.etais.ee) self-service environment encountered issues with loading the login page. The page loading attempt resulted in an error 429: Too Many Requests. The problem occurred intermittently today between 8:00 and 11:00 am. The issue is resolved now. Please get in touch with us if you detect any anomalies (email: [email protected]). Jan 6, 09:36:00 GMT+0 - Resolved - This incident has been resolved.

Read the full incident report →

Minor December 18, 2025

External network issues

Detected by Pingoru: Dec 18, 2025, 09:08 AM UTC
Resolved: Dec 18, 2025, 09:08 AM UTC
Duration: —

Timeline · 1 update

resolved Dec 18, 2025, 09:08 AM UTC

Type: Incident Duration: 3 hours and 57 minutes Affected Components: Open OnDemand, kubernetes.hpc.ut.ee, RStudio, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Dec 18, 13:04:30 GMT+0 - Resolved - We have identified the issue and work on resolving it to prevent it in the future. Due to a migration of networks between central University network and ours, a configuration ovelap caused the websites to be unavailable. Dec 18, 09:08:00 GMT+0 - Identified - We have detected an issue with external network, which caused services to be unavailable. Dec 18, 09:20:00 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result, trying to find the initial cause.

Read the full incident report →

Major December 10, 2025

ondemand.hpc.ut.ee is currently unavailable, we are working on fix

Detected by Pingoru: Dec 10, 2025, 10:01 AM UTC
Resolved: Dec 10, 2025, 10:01 AM UTC
Duration: —

Timeline · 1 update

resolved Dec 10, 2025, 10:01 AM UTC

Type: Incident Duration: 33 minutes Affected Components: Open OnDemand, RStudio Dec 10, 10:01:39 GMT+0 - Investigating - [ondemand.hpc.ut.ee](http://ondemand.hpc.ut.ee) is currently unavailable. We are investigating this incident. Rocket cluster is still available via shell access. Dec 10, 10:05:10 GMT+0 - Investigating - The SFTP service is affected as well. Dec 10, 10:34:48 GMT+0 - Resolved - Ondemand services affected by crash of NFS subsystem. The issue was resolved, all services operational once again

Read the full incident report →

Major December 2, 2025

registry.hpc.ut.ee outage

Detected by Pingoru: Dec 02, 2025, 04:53 AM UTC
Resolved: Dec 02, 2025, 04:53 AM UTC
Duration: —

Timeline · 1 update

resolved Dec 02, 2025, 04:53 AM UTC

Type: Incident Duration: 2 hours and 6 minutes Affected Components: registry.hpc.ut.ee Dec 2, 04:53:22 GMT+0 - Investigating - registry.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service. Dec 2, 06:59:23 GMT+0 - Resolved - registry.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Minor November 14, 2025

courses.cs.ut.ee

Detected by Pingoru: Nov 14, 2025, 06:19 AM UTC
Resolved: Nov 14, 2025, 06:19 AM UTC
Duration: —

Timeline · 1 update

resolved Nov 14, 2025, 06:19 AM UTC

Type: Incident Duration: 24 minutes Nov 14, 06:19:28 GMT+0 - Investigating - Courses.cs.ut.ee is unavailable Nov 14, 06:43:17 GMT+0 - Resolved - This incident has been resolved. Issue was related to database access. This has now been resolved.

Read the full incident report →

Minor November 12, 2025

S3 service at object.hpc.ut.ee inaccessible

Detected by Pingoru: Nov 12, 2025, 09:20 AM UTC
Resolved: Nov 12, 2025, 09:20 AM UTC
Duration: —

Timeline · 1 update

resolved Nov 12, 2025, 09:20 AM UTC

Type: Incident Duration: 15 minutes Nov 12, 09:20:57 GMT+0 - Identified - The S3 service hosted at https://object.hpc.ut.ee is currently inaccessible. We have identified the issue and are working to resolve it. Nov 12, 09:36:15 GMT+0 - Resolved - An internal certificate had expired. This issue has now been resolved. We are implementing safeguards to prevent it from happening again.

Read the full incident report →

Minor August 22, 2025

Galaxy update 24.0 -> 25.0

Detected by Pingoru: Aug 22, 2025, 06:00 AM UTC
Resolved: Aug 22, 2025, 06:00 AM UTC
Duration: —

Timeline · 1 update

Read the full incident report →