Adeptcore experienced a major incident on January 8, 2022 affecting ACP - Nodes, lasting 1d 18h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jan 08, 2022, 08:10 PM UTC
We're currently investigating an issue with Node04 at our Chicago Datacenter. Some customers may experience issues logging in if their virtual machines are located on this node. More information to follow as we investigate and resolve this issue.
- investigating Jan 08, 2022, 08:46 PM UTC
We've determined that the cause of the issues on Node04 was a failed stick of RAM which in turn caused a purple screen of death. We have powered the affected node back on and are in the process of migrating virtual machines to other hosts. This will prepare Node04 for the necessary maintenance and memory replacement by datacenter staff. As it stands, 30+ virtual machines have been moved and services for affected clients/tenants are being restored.
- identified Jan 08, 2022, 08:53 PM UTC
All virtual machines have been migrated to other nodes and powered on. We are currently confirming all necessary services are running on the affected servers. The failed RAM module on the affected node is scheduled to be replaced at 3PM CST by datacenter staff. No downtime will occur as a result of this memory replacement as all virtual machines have been migrated.
- monitoring Jan 08, 2022, 09:37 PM UTC
All services appear to be in working order on the virtual machines that were migrated. We've also just received word that the failed memory module has been replaced by datacenter technicians. We are currently confirming that the affected node is back to full working order and are monitoring the rest of the environment. Once we've confirmed that this is the case, we will be moving some virtual machines back to this node. No downtime will occur as a result of any of these tasks.
- resolved Jan 10, 2022, 03:03 PM UTC
This incident is being marked as resolved. We have not seen any new issues come up during our monitoring after the affected RAM module was replaced. We will continue to monitor the infrastructure.