Paperspace experienced a major incident on June 16, 2024 affecting US (NY2) and Gradient, lasting 18h 48m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 16, 2024, 11:16 AM UTC
We are currently investigating a network issue that is preventing users from interacting with Gradient notebooks and Core VMs based in NY2 region.
- investigating Jun 16, 2024, 12:13 PM UTC
We are continuing to investigate this issue.
- investigating Jun 16, 2024, 12:22 PM UTC
We are continuing to investigate this issue.
- investigating Jun 16, 2024, 03:23 PM UTC
We are continuing to investigate this issue.
- investigating Jun 16, 2024, 07:03 PM UTC
We are continuing to investigate this issue.
- identified Jun 16, 2024, 09:55 PM UTC
The issue has been identified, it appears to be an issue with our network in NY2. A fix is being implemented.
- monitoring Jun 17, 2024, 02:11 AM UTC
We've implemented a series of fixes and are monitoring the results. While Core is already fixed, the Gradient platform will continue to have degraded performance since it is still taking time to recover. We will continue working on fixes for it.
- resolved Jun 17, 2024, 06:05 AM UTC
Our engineers have resolved the issue. If you continue to experience issues, please contact our Support Team.
- postmortem Jun 21, 2024, 06:46 PM UTC
# Paperspace NY2 network outage Postmortem ## Incident Summary On June 16th, a significant number of virtual machines in the NY2 region became unavailable and went into Read-Only \(RO\) mode, which affected network availability. By 18:20 UTC on the same day, network connectivity was restored to the affected virtual machines. However, affected virtual machines continued to be unavailable until 12:23 UTC on June 17th as they were still in RO mode. The affected virtual machines were restored to Read-Write \(RW\) mode between 5:58 and 12:23 UTC on June 17th. By 12:24 UTC, affected virtual machines were available and network access to customers was restored. ## Incident Details ### Root Cause The root cause of this unexpected service unavailability is related to a core switch failure in NY2 that caused a large number of machines to go into RO mode. ### Impact As a result of the failure of the core switch, there was a spike in traffic that adversely impacted network performance. The virtual machines went into RO mode as a result of the loss of network access to the Network File System on which they resided. Since virtual machines were in RO mode, customers were unable to perform any write operations on them, resulting in service disruptions around 10:30 UTC on June 16th, 2024. ## Remediation Actions A number of efforts are underway to try to prevent these types of failures from occurring again, including a network redesign and installation of new equipment. On behalf of Paperspace, we apologize for the disruption to your services and appreciate your understanding. If you have any questions or concerns, please open a ticket with our[ Customer Support team](https://docs.digitalocean.com/support/paperspace/#open-a-ticket.).