Crusoe incident

Degraded Infiniband Performance in us-southcentral

Major Resolved View vendor source →

Crusoe experienced a major incident on January 23, 2025 affecting Infiniband Networks, lasting 6h 7m. The incident has been resolved; the full update timeline is below.

Started
Jan 23, 2025, 09:53 PM UTC
Resolved
Jan 24, 2025, 04:01 AM UTC
Duration
6h 7m
Detected by Pingoru
Jan 23, 2025, 09:53 PM UTC

Affected components

Infiniband Networks

Update timeline

  1. investigating Jan 23, 2025, 09:53 PM UTC

    We are currently investigating an issue with Infiniband in us-southcentral region. This might cause degraded throughput or timeouts

  2. identified Jan 24, 2025, 01:00 AM UTC

    We have identified the issue and have scheduled an emergency maintenance to occur at 5:30 PM PST. The maintenance will affect all Infiniband interfaces for the impacted workloads

  3. monitoring Jan 24, 2025, 02:35 AM UTC

    The maintenance has concluded successfully and we are monitoring the systems now

  4. resolved Jan 24, 2025, 04:01 AM UTC

    This incident is now resolved