UTHPC Outage History

UTHPC is up right now

UTHPC had 21 outages in the last 2 years totaling 36h 11m of downtime — averaging 0.9 incidents per month.

There were 21 UTHPC outages since June 28, 2025 totaling 36h 11m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.hpc.ut.ee

Major June 5, 2026

Galaxy outage

Detected by Pingoru
Jun 05, 2026, 08:59 AM UTC
Resolved
Jun 05, 2026, 09:17 AM UTC
Duration
18m
Affected: Services (Galaxy)
Timeline · 2 updates
  1. investigating Jun 05, 2026, 08:59 AM UTC

    Galaxy cannot be accessed at the moment. This incident was created by an automated monitoring service.

  2. resolved Jun 05, 2026, 09:17 AM UTC

    Galaxy is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Minor June 2, 2026

Rocket cluster file system instability

Detected by Pingoru
Jun 02, 2026, 12:58 PM UTC
Resolved
Jun 03, 2026, 03:00 PM UTC
Duration
1d 2h
Affected: rocket.hpc.ut.eeServices (Galaxy)Services (Open OnDemand)
Timeline · 2 updates
  1. investigating Jun 02, 2026, 12:58 PM UTC

    We have identified an issue with the rocket cluster file systems that causes transient errors on nodes. Less nodes may be available while we work on fixing the underlying issue.

  2. resolved Jun 03, 2026, 03:00 PM UTC

    This underlying issues with the filesystem have been identified and resolved. All systems back operational.

Read the full incident report →

Major June 2, 2026

Issues with MyAccessID UT identity verification

Detected by Pingoru
Jun 02, 2026, 07:51 AM UTC
Resolved
Jun 02, 2026, 09:50 AM UTC
Duration
1h 58m
Affected: Waldur portals (puhuri-portal.neic.no)kubernetes.hpc.ut.eeWaldur portals (puhuri.metacenter.no)minu.etais.eeWaldur portals (my.lumi-supercomputer.eu)Waldur portals (lumi.deic.dk)Services (Open OnDemand)Waldur portals (account.lumi.cscs.ch)
Timeline · 3 updates
  1. investigating Jun 02, 2026, 07:51 AM UTC

    The University of Tartu authentication system is currently experiencing technical issues. As a result, login using **MyAccessID UT identity verification** is temporarily unavailable. The University of Tartu Information Technology Office (ITO) has been informed and is working on resolving the issue. We apologize for the inconvenience and recommend trying again later.

  2. investigating Jun 02, 2026, 08:24 AM UTC

    This issue is also impacting the MyAccessID authentication process for Kubernetes.

  3. resolved Jun 02, 2026, 09:50 AM UTC

    This incident has been resolved.

Read the full incident report →

Major May 15, 2026

docs.hpc.ut.ee outage

Detected by Pingoru
May 15, 2026, 02:01 PM UTC
Resolved
May 15, 2026, 02:13 PM UTC
Duration
12m
Affected: UT HPC webservices (docs.hpc.ut.ee)
Timeline · 2 updates
  1. investigating May 15, 2026, 02:01 PM UTC

    docs.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.

  2. resolved May 15, 2026, 02:13 PM UTC

    docs.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major May 15, 2026

hpc.ut.ee outage

Detected by Pingoru
May 15, 2026, 02:01 PM UTC
Resolved
May 15, 2026, 02:15 PM UTC
Duration
13m
Affected: UT HPC webservices (hpc.ut.ee)
Timeline · 2 updates
  1. investigating May 15, 2026, 02:01 PM UTC

    hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.

  2. resolved May 15, 2026, 02:15 PM UTC

    hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major May 15, 2026

support.hpc.ut.ee outage

Detected by Pingoru
May 15, 2026, 01:53 AM UTC
Resolved
May 15, 2026, 02:07 AM UTC
Duration
13m
Affected: UT HPC webservices (support.hpc.ut.ee)
Timeline · 2 updates
  1. investigating May 15, 2026, 01:53 AM UTC

    support.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service.

  2. resolved May 15, 2026, 02:07 AM UTC

    support.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Major April 30, 2026

Possible temporary service interruptions due to Critical Linux Kernel Vulnerability (CVE-2026-31431)

Detected by Pingoru
Apr 30, 2026, 06:45 AM UTC
Resolved
Apr 30, 2026, 01:48 PM UTC
Duration
7h 3m
Affected: rocket.hpc.ut.eeUT HPC webservices (hpc.ut.ee)Waldur portals (puhuri-portal.neic.no)UT HPC webservices (docs.hpc.ut.ee)kubernetes.hpc.ut.eeWaldur portals (puhuri.metacenter.no)Services (Galaxy)minu.etais.eeUT HPC webservices (support.hpc.ut.ee)Services (RStudio)Waldur portals (my.lumi-supercomputer.eu)Waldur portals (lumi.deic.dk)Services (Open OnDemand)UT HPC webservices (registry.hpc.ut.ee)Waldur portals (account.lumi.cscs.ch)
Timeline · 4 updates
  1. identified Apr 30, 2026, 06:45 AM UTC

    Due to the recently disclosed “[Copy Fail](https://copy.fail/)” (CVE-2026-31431) Linux kernel vulnerability, our system administrators are actively applying mitigation measures and updates across all Tartu University HPC Center systems today. As a result, you may experience temporary interruptions or reduced service availability while this work is in progress. All UT HPC services are affected. At this time, the work is expected to be completed today; however, we will provide further updates if the situation extends beyond today.continuing to work on a fix for this incident. **Users running their own Linux virtual machines are advised to apply the recommended patches on their systems by following the instructions provided in the “Mitigation” section of the** [**CVE-2026-31431 advisory**](https://copy.fail/)**.** Thank you for your understanding.

  2. identified Apr 30, 2026, 08:41 AM UTC

    $18

  3. identified Apr 30, 2026, 09:43 AM UTC

    We have created a documentation for CVE-2026-31431 mitigation: This is primarily useful for UT Cloud virtual machine managers. We'll keep updating the document with the best approaches as we learn more.

  4. resolved Apr 30, 2026, 01:48 PM UTC

    We have applied mitigation measures for the Critical Linux Kernel Vulnerability (CVE-2026-31431) across all affected services. The incident is resolved. However, the virtual machine managers are still required to apply the patches following the guides published here: You are welcome to contact support for additional information: [[email protected]](mailto:[email protected]) Best, UT HPC Center

Read the full incident report →

Minor February 26, 2026

Inaccessible services

Detected by Pingoru
Feb 26, 2026, 08:28 AM UTC
Resolved
Feb 26, 2026, 08:28 AM UTC
Duration
Timeline · 1 update
  1. resolved Feb 26, 2026, 08:28 AM UTC

    Type: Incident Duration: 38 minutes Affected Components: RStudio, Open OnDemand, , kubernetes.hpc.ut.ee, puhuri-portal.neic.no, puhuri.metacenter.no, account.lumi.cscs.ch, my.lumi-supercomputer.eu, lumi.deic.dk, , Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → Services → Feb 26, 08:28:43 GMT+0 - Investigating - We are currently investigating this incident. Feb 26, 08:37:47 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Feb 26, 09:06:52 GMT+0 - Resolved - This incident has been resolved.

Read the full incident report →

Minor February 5, 2026

HPC Cluster Software Module Load Interrupted

Detected by Pingoru
Feb 05, 2026, 01:26 PM UTC
Resolved
Feb 05, 2026, 01:26 PM UTC
Duration
Timeline · 1 update
  1. resolved Feb 05, 2026, 01:26 PM UTC

    Type: Incident Duration: 3 hours and 5 minutes Affected Components: RStudio, Open OnDemand, rocket.hpc.ut.ee Feb 5, 16:31:08 GMT+0 - Resolved - The cluster software stack has been restored from a prior point-in-time copy. All operations should be normal Feb 5, 13:26:06 GMT+0 - Investigating - In HPC Cluster some software modules loading may give an error. We are fixing the issue. Thank you for your patience.

Read the full incident report →

Minor January 26, 2026

Network issues

Detected by Pingoru
Jan 26, 2026, 10:41 AM UTC
Resolved
Jan 26, 2026, 10:41 AM UTC
Duration
Timeline · 1 update
  1. resolved Jan 26, 2026, 10:41 AM UTC

    Type: Incident Duration: 39 minutes Affected Components: RStudio, kubernetes.hpc.ut.ee, puhuri.metacenter.no, minu.etais.ee, my.lumi-supercomputer.eu, Open OnDemand, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Jan 26, 10:41:32 GMT+0 - Identified - We have detected a network issue that has caused services to be unavailable. Jan 26, 11:20:58 GMT+0 - Resolved - The issue has been resolved, and all services are now available again.

Read the full incident report →

Minor January 14, 2026

Kubernetes network connections

Detected by Pingoru
Jan 14, 2026, 02:15 PM UTC
Resolved
Jan 14, 2026, 02:15 PM UTC
Duration
Timeline · 1 update
  1. resolved Jan 14, 2026, 02:15 PM UTC

    Type: Incident Duration: 1 hour and 31 minutes Affected Components: , kubernetes.hpc.ut.ee, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, , hpc.ut.ee, registry.hpc.ut.ee, Waldur portals → UT HPC webservices → Jan 14, 14:15:00 GMT+0 - Investigating - We're currently seeing higher request error and request latency rates to applications in Kubernetes. Working on trying to find a cause. Jan 14, 15:21:43 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Jan 14, 15:46:27 GMT+0 - Resolved - The issue was related to the amount of connections to/from Kubernetes, our firewalls were configured to not allow that high connection rate, but as we are adding new nodes to the cluster, the base rate is exceeding the limits. The limits have been revisited, and new monitoring is being added to not have the same issue in the future.

Read the full incident report →

Minor January 6, 2026

minu.etais.ee self-service environment issues with loading login page

Detected by Pingoru
Jan 06, 2026, 09:00 AM UTC
Resolved
Jan 06, 2026, 09:00 AM UTC
Duration
Timeline · 1 update
  1. resolved Jan 06, 2026, 09:00 AM UTC

    Type: Incident Affected Components: minu.etais.ee, puhuri-portal.neic.no, lumi.deic.dk Jan 6, 09:00:00 GMT+0 - Resolved - [minu.etais.ee](http://minu.etais.ee) self-service environment encountered issues with loading the login page. The page loading attempt resulted in an error 429: Too Many Requests. The problem occurred intermittently today between 8:00 and 11:00 am. The issue is resolved now. Please get in touch with us if you detect any anomalies (email: [email protected]). Jan 6, 09:36:00 GMT+0 - Resolved - This incident has been resolved.

Read the full incident report →

Minor December 18, 2025

External network issues

Detected by Pingoru
Dec 18, 2025, 09:08 AM UTC
Resolved
Dec 18, 2025, 09:08 AM UTC
Duration
Timeline · 1 update
  1. resolved Dec 18, 2025, 09:08 AM UTC

    Type: Incident Duration: 3 hours and 57 minutes Affected Components: Open OnDemand, kubernetes.hpc.ut.ee, RStudio, my.lumi-supercomputer.eu, puhuri.metacenter.no, minu.etais.ee, puhuri-portal.neic.no, account.lumi.cscs.ch, docs.hpc.ut.ee, lumi.deic.dk, hpc.ut.ee, Galaxy, support.hpc.ut.ee, registry.hpc.ut.ee, rocket.hpc.ut.ee Dec 18, 13:04:30 GMT+0 - Resolved - We have identified the issue and work on resolving it to prevent it in the future. Due to a migration of networks between central University network and ours, a configuration ovelap caused the websites to be unavailable. Dec 18, 09:08:00 GMT+0 - Identified - We have detected an issue with external network, which caused services to be unavailable. Dec 18, 09:20:00 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result, trying to find the initial cause.

Read the full incident report →

Major December 10, 2025

ondemand.hpc.ut.ee is currently unavailable, we are working on fix

Detected by Pingoru
Dec 10, 2025, 10:01 AM UTC
Resolved
Dec 10, 2025, 10:01 AM UTC
Duration
Timeline · 1 update
  1. resolved Dec 10, 2025, 10:01 AM UTC

    Type: Incident Duration: 33 minutes Affected Components: Open OnDemand, RStudio Dec 10, 10:01:39 GMT+0 - Investigating - [ondemand.hpc.ut.ee](http://ondemand.hpc.ut.ee) is currently unavailable. We are investigating this incident. Rocket cluster is still available via shell access. Dec 10, 10:05:10 GMT+0 - Investigating - The SFTP service is affected as well. Dec 10, 10:34:48 GMT+0 - Resolved - Ondemand services affected by crash of NFS subsystem. The issue was resolved, all services operational once again

Read the full incident report →

Major December 2, 2025

registry.hpc.ut.ee outage

Detected by Pingoru
Dec 02, 2025, 04:53 AM UTC
Resolved
Dec 02, 2025, 04:53 AM UTC
Duration
Timeline · 1 update
  1. resolved Dec 02, 2025, 04:53 AM UTC

    Type: Incident Duration: 2 hours and 6 minutes Affected Components: registry.hpc.ut.ee Dec 2, 04:53:22 GMT+0 - Investigating - registry.hpc.ut.ee cannot be accessed at the moment. This incident was created by an automated monitoring service. Dec 2, 06:59:23 GMT+0 - Resolved - registry.hpc.ut.ee is now operational! This update was created by an automated monitoring service.

Read the full incident report →

Minor November 14, 2025

courses.cs.ut.ee

Detected by Pingoru
Nov 14, 2025, 06:19 AM UTC
Resolved
Nov 14, 2025, 06:19 AM UTC
Duration
Timeline · 1 update
  1. resolved Nov 14, 2025, 06:19 AM UTC

    Type: Incident Duration: 24 minutes Nov 14, 06:19:28 GMT+0 - Investigating - Courses.cs.ut.ee is unavailable Nov 14, 06:43:17 GMT+0 - Resolved - This incident has been resolved. Issue was related to database access. This has now been resolved.

Read the full incident report →

Minor November 12, 2025

S3 service at object.hpc.ut.ee inaccessible

Detected by Pingoru
Nov 12, 2025, 09:20 AM UTC
Resolved
Nov 12, 2025, 09:20 AM UTC
Duration
Timeline · 1 update
  1. resolved Nov 12, 2025, 09:20 AM UTC

    Type: Incident Duration: 15 minutes Nov 12, 09:20:57 GMT+0 - Identified - The S3 service hosted at https://object.hpc.ut.ee is currently inaccessible. We have identified the issue and are working to resolve it. Nov 12, 09:36:15 GMT+0 - Resolved - An internal certificate had expired. This issue has now been resolved. We are implementing safeguards to prevent it from happening again.

Read the full incident report →