Auvik incident

Service Disruption - The US4 cluster is down

Major Resolved View vendor source →

Auvik experienced a major incident on February 3, 2025 affecting us4.my.auvik.com, lasting 1h 7m. The incident has been resolved; the full update timeline is below.

Started
Feb 03, 2025, 10:23 AM UTC
Resolved
Feb 03, 2025, 11:30 AM UTC
Duration
1h 7m
Detected by Pingoru
Feb 03, 2025, 10:23 AM UTC

Affected components

us4.my.auvik.com

Update timeline

  1. investigating Feb 03, 2025, 10:23 AM UTC

    Affected Services: All clients are currently not accessible Service not impacted: NA Description: Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: Users are experiencing no access to their tenants Next Steps: We will provide updates as more information becomes available or within the next at 11:00 UTC. Thank you for your patience as we work to restore full functionality.

  2. identified Feb 03, 2025, 10:32 AM UTC

    Affected Services: All clients are currently not accessible Description: Our team has identified the root cause of the site down. We are currently investigating a solution to restore normal service levels. Impact: While we work on the resolution, users may experience slower load times and intermittent connectivity issues, Next Steps: Our team is actively working to resolve the issue and will provide updates as progress is made or by 11:00 UTC Thank you for your patience as we work to restore full functionality.

  3. monitoring Feb 03, 2025, 11:01 AM UTC

    Affected Services: Clients on US4 Cluster Description: Our team has implemented a fix for the issue, and tenants are in the process of becoming fully accessible. We are monitoring the situation to ensure stability and confirm that the service remains fully functional. Impact: Services should be operating normally; with a few client sites still in the process of starting up. We continue to monitor for any irregularities. Next Steps: We will provide a final update once we confirm the issue is fully resolved. Thank you for your patience, and we apologize for any inconvenience caused.

  4. resolved Feb 03, 2025, 11:30 AM UTC

    Affected Services: clients in US4 are now accessible. Description: The issue affecting US4 tenants has been resolved. Regular service has been restored, and all systems are operating as expected. Impact: Users should no longer experience any issues related to this incident. Next Steps: We are preparing a detailed Root Cause Analysis (RCA) report to provide further insights into the incident and preventive measures. Thank you for your patience, and we apologize for any inconvenience caused.

  5. postmortem Feb 18, 2025, 04:34 PM UTC

    # Service Disruption - Clients on US4 are not accessible ‌ ## Root Cause Analysis ### Duration of incident Discovered: Feb 03, 2023 Time - 09:27 - UTC Resolved: Feb 03, 2023 Time - 11:55 - UTC ### Cause Overload of backend resources for services on the US4 cluster. ### Effect Tenants on the US4 cluster became inaccessible. ### Action taken _All times in UTC_ **02/03/2025** **09:27** - Engineering receives alerts that tenants on the US4 cluster are not accessible. **09:33** - Engineering reacts to the outage and begins its investigation. **09:53** - Engineering restarts US4 cluster backends to address its non-responsiveness. **9:53- 11:55** - The cluster is observed as it restarted and monitored as it comes up to full functionality. The incident is declared resolved. ### Future consideration\(s\) * Auvik is currently improving backend monitoring and stability within the product and infrastructure. These improvements are aimed to assist in proactively mitigating potential issues in the future.