Crusoe incident
VM Creation Failure for A100 Infiniband Type VMs in us-east1-a Region
Crusoe experienced a minor incident on October 2, 2025 affecting us-east1, lasting 16h 49m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Oct 02, 2025, 04:00 AM UTC
We have identified an issue that is preventing new or restarted Virtual Machines from booting successfully on our A100 Infiniband hardware fleet. Any new VM provisioning request for this hardware type will also fail. Additionally, any existing VM on an A100 Infiniband node that is stopped and started (or rebooted) will also fail to come back online. Existing, currently running VMs are not affected and will continue to operate normally. We advise customers to avoid rebooting critical workloads on this hardware until a resolution is in place. Our engineering teams are actively investigating the root cause and are working to restore normal provisioning operations as quickly as possible.
- monitoring Oct 02, 2025, 08:20 PM UTC
A fix has been implemented, and we are monitoring the environment.
- resolved Oct 02, 2025, 08:50 PM UTC
This incident is now resolved.