Endless Group incident
VM Host Disk Failure (Was: Continued Maintenance)
Endless Group experienced a critical incident on May 27, 2023 affecting DirectAdmin and Homepage/Signups and 1 more component, lasting 4d 8h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring May 25, 2023, 03:02 AM UTC
The previous maintenance tasks have not yet been completed.
- investigating May 25, 2023, 03:20 AM UTC
We are experiencing a problem where one of our host machines is unable to successfully reboot following the upgrade. We are working on this issue as fast as possible.
- identified May 25, 2023, 11:25 PM UTC
We have identified the problem as a failing disk in one of our host machines. We are recovering the machine but as this may use a large portion of our in-network bandwidth, please expect degraded performance on your sites at this time.
- identified May 27, 2023, 05:38 AM UTC
We will be rebooting the remaining host system in order to finalize the update.
- identified May 27, 2023, 06:06 AM UTC
We are continuing to work on restoring the failed host. Most customer systems should be back online at this time.
- monitoring May 28, 2023, 08:17 AM UTC
All host systems have been successfully restored and confirmed to be operational. New drives were installed in the failing machine. Additionally, our new host system has been joined to the cluster. We are now monitoring to ensure that all components are operating normally. All customer systems should be online at this time. If you are experiencing an issue with your system, please contact us using our support channels.
- resolved May 31, 2023, 03:04 PM UTC
We have been monitoring our host systems and have not observed any further issues. We consider this incident to be resolved. If you are still experiencing problems, please contact our support.