Crusoe incident
Degraded Infiniband Performance in us-southcentral
Crusoe experienced a major incident on January 23, 2025 affecting Infiniband Networks, lasting 6h 7m. The incident has been resolved; the full update timeline is below.
Affected components
Infiniband Networks
Update timeline
- investigating Jan 23, 2025, 09:53 PM UTC
We are currently investigating an issue with Infiniband in us-southcentral region. This might cause degraded throughput or timeouts
- identified Jan 24, 2025, 01:00 AM UTC
We have identified the issue and have scheduled an emergency maintenance to occur at 5:30 PM PST. The maintenance will affect all Infiniband interfaces for the impacted workloads
- monitoring Jan 24, 2025, 02:35 AM UTC
The maintenance has concluded successfully and we are monitoring the systems now
- resolved Jan 24, 2025, 04:01 AM UTC
This incident is now resolved