Crusoe incident

VM Creation Failure for A100 Infiniband Type VMs in us-east1-a Region

Minor Resolved View vendor source →

Crusoe experienced a minor incident on October 2, 2025 affecting us-east1, lasting 16h 49m. The incident has been resolved; the full update timeline is below.

Started
Oct 02, 2025, 04:00 AM UTC
Resolved
Oct 02, 2025, 08:50 PM UTC
Duration
16h 49m
Detected by Pingoru
Oct 02, 2025, 04:00 AM UTC

Affected components

us-east1

Update timeline

  1. identified Oct 02, 2025, 04:00 AM UTC

    We have identified an issue that is preventing new or restarted Virtual Machines from booting successfully on our A100 Infiniband hardware fleet. Any new VM provisioning request for this hardware type will also fail. Additionally, any existing VM on an A100 Infiniband node that is stopped and started (or rebooted) will also fail to come back online. Existing, currently running VMs are not affected and will continue to operate normally. We advise customers to avoid rebooting critical workloads on this hardware until a resolution is in place. Our engineering teams are actively investigating the root cause and are working to restore normal provisioning operations as quickly as possible.

  2. monitoring Oct 02, 2025, 08:20 PM UTC

    A fix has been implemented, and we are monitoring the environment.

  3. resolved Oct 02, 2025, 08:50 PM UTC

    This incident is now resolved.