Auvik incident
Service Disruption - Auvik clients are experiencing a disruption of services - Multiple Clusters
Auvik experienced a notice incident on August 25, 2025 affecting us1.my.auvik.com and us2.my.auvik.com and 1 more component, lasting 1d 19h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 25, 2025, 06:32 PM UTC
Affected Services: Clients are inaccessible Cluster(s): All Cluster We are currently experiencing a service disruption. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: It is still being determined. Next Steps: We will update this information as more details become available. We appreciate your patience as we work to restore full functionality.
- investigating Aug 25, 2025, 06:41 PM UTC
Affected Services: Clients are not accessible Cluster(s): All Clusters Description: We are currently experiencing degraded services. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: Users may experience an inability to access their tenants. We are currently performing a cluster restart on the US2 cluster. Down time is expected to be 1-1.5 hours. We are investigating the other clusters. Next Steps: We will update this information as more details become available. We appreciate your patience as we work to restore full functionality.
- investigating Aug 25, 2025, 07:07 PM UTC
Affected Services: Clients are not accessible Cluster(s): US2 and US5 Description: We are currently experiencing degraded services. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: Users may experience an inability to access their tenants. We are currently performing a cluster restart on the US2 cluster. Down time is expected to be 1-1.5 hours. We are now also performing a cluster restart on the US5 cluster. Down time is expected to be 1-1.5 hours. The other clusters appear not to be affected. Monitoring and alerting are working on them Next Steps: We will update this information as more details become available.
- identified Aug 25, 2025, 08:07 PM UTC
Our team has identified a suspected cause of the service disruption on the US2 and US5 clusters and is taking steps to remediate the issue. Impact: Customers may continue to experience connectivity issues as we bring the US2 and US5 cluster back into production. Please report any possible related issues to Auvik support. Next Steps: We are applying mitigation measures and will provide updates on progress.
- identified Aug 25, 2025, 08:43 PM UTC
Our team has identified a suspected cause of the service disruption on the US2 and US5 clusters and is taking steps to remediate the issue. Impact: Customers may continue to experience connectivity issues as we deliberately bring the US2 and US5 clusters back into production. The estimated time for recovery of these clusters has been extended. Please report any possible related issues to Auvik support. Next Steps: We are applying mitigation measures and will provide updates on progress.
- monitoring Aug 25, 2025, 09:37 PM UTC
We have applied changes to address the issue on the US2 and US5 clusters. Site access should be restored. Services appear to be still recovering, but we are monitoring closely for stability. Impact: Information for tenants in the US2 and US5 clusters is still experiencing a delay in the product. If you continue to encounter problems, please report them to Auvik Support. Next Steps: A final update will be posted once we confirm resolution.
- monitoring Aug 25, 2025, 09:56 PM UTC
We have applied changes to address the issue on the US2 and US5 clusters. Site access is restored. Services appear to be still recovering, but we are monitoring closely for stability. Impact: Information for tenants in the US2 and US5 clusters is still experiencing a delay in the product and will continue to recover. We will be monitoring the services into the evening. If you continue to encounter problems, please report them to Auvik Support. Next Steps: A final update will be posted once we confirm resolution.
- monitoring Aug 26, 2025, 11:16 AM UTC
US2 and US5 clusters are running and stable. The AU1 cluster had some slowness over the evening, which has been resolved. Clients in the US4 cluster have had some reported lag in services that is currently under investigation. Impact: US4 is experiencing possible slowness with load times and access to the lag. If you continue to encounter problems, please report them to Auvik Support.
- monitoring Aug 26, 2025, 02:10 PM UTC
We have implemented changes to address the outstanding issues, and services are currently operating as expected. As a precaution, all clusters are being closely monitored to ensure continued stability. Impact: Services should be functioning normally. If you continue to experience any problems, please contact Auvik Support. Next Steps: We will continue monitoring and will share any further updates if necessary.
- monitoring Aug 26, 2025, 03:47 PM UTC
Several tenants in the EU2 cluster may have experienced an interruption in service, including Collector disconnects. Impact: Services should be returning to normal. If you continue to experience any problems, please contact Auvik Support. Next Steps: We will continue monitoring and will share any further updates if necessary.
- identified Aug 26, 2025, 03:54 PM UTC
Our team has identified a suspected cause of the slowness and permission errors in EU2 and is taking steps to remediate the issue. Impact: Customers may continue to experience slowness and possible access to their sites. Please report any related issues to Auvik Support so we can track and assist further. Next Steps: We are applying mitigation measures and will provide updates on progress.
- identified Aug 26, 2025, 04:34 PM UTC
The sites in the EU2 cluster are recovering. Some sites on the US6 cluster are experiencing missing data in the UI (Maps, Devices, etc). This is being addressed. Impact: Customers may continue to experience slowness and may have limited access to their sites on the affected clusters. Please report any related issues to Auvik Support so we can track and assist further.
- monitoring Aug 26, 2025, 05:02 PM UTC
The sites in the EU2 cluster have recovered. Some sites on the US6 cluster experienced missing data in the UI (e.g., Maps, Devices). This has also been addressed. We will be performing rolling maintenance on sites on the AU1 cluster, during which the site may experience a momentary disconnection. Collectors may need to reconnect to Auvik. Impact: Customers may continue to experience slowness and may have limited access to their sites on the affected clusters. Please report any related issues to Auvik Support so we can track and assist further.
- monitoring Aug 26, 2025, 10:22 PM UTC
We have implemented changes to address the outstanding issues, and services are currently operating as expected. As a precaution, all clusters are being closely monitored to ensure continued stability. This will continue throughout the evening. Impact: Services should be functioning normally. If you continue to experience any issues, please contact Auvik Support. Next Steps: We will continue to monitor and share any further updates as necessary.
- monitoring Aug 27, 2025, 08:49 AM UTC
We have implemented changes to address the outstanding issues, and services are currently operating as expected. As a precaution, all clusters are being closely monitored to ensure continued stability. This will continue throughout the day. Impact: Services should be functioning normally. If you continue to experience any issues, please contact Auvik Support. Next Steps: We will continue to monitor and share any further updates as necessary.
- investigating Aug 27, 2025, 02:09 PM UTC
The incident has been fully resolved, and services are operating as expected. Impact: Customers should no longer experience any related issues. If you continue to experience problems, please report them to Auvik Support. We will be posting an RCA as a follow-up.
- resolved Aug 27, 2025, 02:13 PM UTC
The incident has been fully resolved, and services are operating as expected. Impact: Customers should no longer experience any related issues. If you continue to experience problems, please report them to Auvik Support. We will be posting an RCA as a follow-up.
- postmortem Sep 02, 2025, 04:34 PM UTC
# Service Disruption - Intermittent Availability & Performance Issues Across Multiple Clusters ## Root Cause Analysis ### Duration of the incident Discovered: Aug 25, 2025 18:00 - UTC Resolved: Aug 29, 2025 14:00 - UTC ### Cause A configuration rollout unexpectedly generated a large number of configuration entries, which propagated across tenants. This resulted in excessive background processing and memory pressure in core services. The strain led to degraded performance, instability, and in some cases, brief service crashes across clusters. ### Effect Customers experienced: * Intermittent access and sign-in issues in several regions * Slow page loads and missing/delayed alert notifications * Errors or gaps in specific dashboard and visualization views * Temporary unavailability for a small number of tenants ### Action taken _All times are in UTC_ **08/25/2025** **18:00** — Rollout halted after error rates increased. **19:00** — Targeted service restarts restored partial availability. **22:00** — Added backend capacity and began controlled rollouts. **08/26–08/28/2025** Continued staged rollouts with adjusted capacity. Cleaned up configuration entries for affected tenants. Tuned resource allocations for read/permissioning services **08/29/2025** **14:00** — All clusters stabilized; monitoring confirmed normal performance. ### Future consideration\(s\) * Enhance autoscaling and resource thresholds for services under heavy background processing. * Add scale-aware pre-deployment validation for configuration rollouts. * Refine monitoring to surface customer-visible issues earlier. * Expand operational runbooks for rollback and tenant recovery.