AlphaVPS incident

Sofia network outage

AlphaVPS experienced a major incident on November 10, 2023 affecting Network Infrastructure in Sofia, lasting 12h 10m. The incident has been resolved; the full update timeline is below.

Started: Nov 10, 2023, 06:09 AM UTC
Resolved: Nov 10, 2023, 06:19 PM UTC
Duration: 12h 10m
Detected by Pingoru: Nov 10, 2023, 06:09 AM UTC

Affected components

Network Infrastructure in Sofia

Update timeline

identified Nov 10, 2023, 06:09 AM UTC

Hello, We've identified an issue in our Sofia location, affecting our network. We've isolated it to our QFX Virtual Chassis Core. We've restored back connectivity on a single member of our Virtual Chassis and no redundancy is present at the moment. We're working on bringing the rest of our VC members online to restore redundancy.
identified Nov 10, 2023, 09:04 AM UTC

We've identified the main problem, which was related to one of the members of our Juniper QFX Virtual Chassis. At the moment, we believe that one of the virtual chassis members suffered a catastrophic failure of internal storage. Unfortunately, this specific Juniper QFX device was the active master in our VC configuration. As per best practices, we run multiple devices, which can take over the mastership when the current master fails, however in this case - this has not happened, as JunOS was partially running on the failing device. The master killed the internal routing sessions and no packets were being transmitted via the secondary devices. Once we determined the root cause, we've done a manual switchover to another VC member and connectivity was restored. We're working on replacing the failing device and bringing it back in our VC to restore redundancy. Service redundancy should still be considered to be at risk. IPv6 connectivity is not restored on 100% yet, as our priority is bringing v4 redundancy back in place. Further updates to follow.
identified Nov 10, 2023, 09:39 AM UTC

As of now, IPv6 connectivity is also restored. We're proceeding with replacing the virtual chassis member.
monitoring Nov 10, 2023, 12:34 PM UTC

We've installed a new QFX member, after initial configuration and restored full redundancy. As of now, all systems are back to being fully redundant again. We'll continue monitoring the infrastructure and close this issue later today.
resolved Nov 10, 2023, 06:19 PM UTC

We're closing this incident as resolved.