HUIT experienced a notice incident on April 16, 2025 affecting Other Services, lasting 16h 33m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 16, 2025, 09:12 PM UTC
When logging into https://ood.huit.harvard.edu, users are able to load the Open OnDemand dashboard, but interactive apps will not start properly. The terminal app may load, but the Slurm scheduler is unstable, and compute jobs may or may not run. User data is unaffected, and can still be downloaded through the Open OnDemand dashboard. FAS Academic Technology is troubleshooting this issue and working to restore this service.
- identified Apr 16, 2025, 10:45 PM UTC
The service team believes they've identified a path forward. They continue working to investigate and remediate the root cause.
- monitoring Apr 16, 2025, 11:44 PM UTC
Restarting the slurm controller node and altering its configuration has enabled launching interactive apps in HUIT Open OnDemand once again. HUIT will continue to monitor the service to ensure stability before resolving the Major Incident.
- resolved Apr 17, 2025, 01:46 PM UTC
The outage affecting HUIT Open On Demand has been resolved, and users are able to login and access resources. We'll continue to closely monitor to ensure service stability.