Adeptcore incident

PHX Major Outage

Critical Resolved View vendor source →

Adeptcore experienced a critical incident on January 17, 2024 affecting ACP - Network, lasting 23h 14m. The incident has been resolved; the full update timeline is below.

Started
Jan 17, 2024, 10:57 PM UTC
Resolved
Jan 18, 2024, 10:11 PM UTC
Duration
23h 14m
Detected by Pingoru
Jan 17, 2024, 10:57 PM UTC

Affected components

ACP - Network

Update timeline

  1. investigating Jan 17, 2024, 10:57 PM UTC

    We are aware of reports of servers appearing offline at our PHX datacenter. Our engineers are currently investigating this issue to get it resolved as quickly as possible. We will update this post as more information becomes available.

  2. identified Jan 17, 2024, 11:09 PM UTC

    The cause has been identified as an issue with the datacenter WAN connection. We are currently working with the datacenter engineers to resolve this issue as quickly as possible. We will keep you updated with more information as it becomes available.

  3. identified Jan 17, 2024, 11:26 PM UTC

    We are seeing that the BGP routes to all PHX networks are currently down. This issue is affecting all IP space and not just Adeptcloud, we are working with the providers to get this resolved. We are seeing entire 66.85.128.0/18 being down at the moment.

  4. identified Jan 17, 2024, 11:56 PM UTC

    UPDATE: We are currently seeing an issue that is affecting all IPv4 routing at the PHX datacenter with about 156k IPs affected. We are still working with our datacenter NOC as this issue is affecting all peering.

  5. identified Jan 18, 2024, 12:05 AM UTC

    UPDATE: We are seeing routes propagating and beginning to work right now. Please hold tight as we believe this issue is about to be resolved.

  6. monitoring Jan 18, 2024, 12:07 AM UTC

    UPDATE: Our status alerts are starting to alert that routing is resolved and that services are coming back online. We will post our next update as soon as we verify that all routing is resolved and all services are back online (ETA 15 minutes).

  7. monitoring Jan 18, 2024, 12:25 AM UTC

    All monitoring is showing that all services, including Horizon, are currently back online. MSP tools such as RMM, ScreenConnect and other may take a few minutes to reestablish connectivity. - If you are using an uptime monitor you may see multiple connect/reconnect attempts while the services are actually all still online. - We have confirmed that all VPN tunnels and connectivity have been restored. We are awaiting on an after incident report to see what caused the bad routing/networking issues in the first place. From our preliminary troubleshooting, the global routing tables lost all routes to PHX IP's. We will post an update within 24 hours specifying when to expect an after incident report from us and any further steps being taken.

  8. monitoring Jan 18, 2024, 02:33 AM UTC

    Monitoring has just alerted us to another potential disruption. This one seems to have lasted approximately 3 minutes and all services are back online. This is part of the restoration process while the routes are propagating out. We can confirm this is related to the first outage with the global routes. Currently users are able to work but monitoring software may trigger some alerts as routes are propagating and establishing. We are fully operational at this time and are awaiting the after action report as we continuously monitor the global routes to PHX.

  9. resolved Jan 18, 2024, 10:11 PM UTC

    We are marking the incident as resolved at this time. We are waiting for our peering provider at PHX to provide an after-action report. They estimate their investigation into the issue and a complete report is going to take between seven to fifteen workings days. Once this is done, they will release the information over to us. We have, however, requested an escalation to push up this timeline so we can provide our clients with the exact details of what occurred in a timelier manner. A Postmortem report will be published on here as soon as we receive the after-action report.