Auvik experienced a notice incident on September 10, 2025 affecting my.auvik.com and us1.my.auvik.com and 1 more component, lasting 43m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 10, 2025, 04:43 PM UTC
We are currently investigating reports of sites receiving 500 errors affecting site access. Impact: Customers may experience an inability to connect to their site(s)or have collectors connect. Next Steps: Our team is working to identify contributing factors. Updates will follow as more information becomes available.
- identified Sep 10, 2025, 05:00 PM UTC
Our team has identified a suspected cause of the connection issues and is taking steps to remediate the issue. Impact: Customers who experienced the 500 errors are now remedied. Connection issues after installing a Windows collector are still being investigated. Users may continue to experience the collector not connecting to the site. The following services are not affected: monitoring and alerting. Please report any related issues to Auvik Support so we can track and assist further. Next Steps: We are applying mitigation measures and will provide updates on progress.
- monitoring Sep 10, 2025, 05:07 PM UTC
We have applied changes to address the issue. Services appear to be operating normally, and we are monitoring closely for stability. Impact: Collector connection services should be operating normally; however, if you continue to encounter problems, please report them to Auvik Support. Next Steps: A final update will be posted once we confirm the resolution.
- resolved Sep 10, 2025, 05:26 PM UTC
The connectivity issues with the site and collectors have been fully resolved, and services are operating as expected. Impact: Customers should no longer experience any related issues. If you continue to experience issues, please report them to Auvik Support.
- postmortem Sep 25, 2025, 12:13 AM UTC
# Service Degraded - Login and Collector Installation Issues on US3 Cluster ## Root Cause Analysis ### Duration of the incident Discovered: Sep 10, 2025 13:17 UTC Resolved: Sep 12, 2025 15:08 UTC ### Cause Two separate but overlapping issues contributed to this incident: * Collector Installation Failures – Windows collectors were unable to install due to missing service principal credentials on the backend agent server, which prevented successful API calls for subscription data. * Login and Redirect Failures – Following a restart of the US3 frontend, requests for user and tenant data from secondary clusters intermittently failed. This caused login attempts through Okta to hang and product redirects to fail. ### Effect * Users attempting to log in via Okta were unable to complete authentication and access tenants. * Some sites experienced 500 errors when attempting to access dashboards. * Windows collector installations via GUI and CLI failed, preventing the deployment of new collectors. ### Action taken _All times are in UTC_ **09/10/2025** **13:17** – Users report inability to log in to Auvik Production through Okta. **13:28** – Errors in US3 frontend logs identified relating to user/tenant data queries. **13:37** – Engineering suspends frontend deployment in the US3 cluster. **13:39** – Issues confirmed across secondary clusters; logs analyzed for root cause. **14:54** – Identified that the frontend redirect service could not fetch required tenant data; feature flag disabled to restore functionality. **15:01** – Engineering confirms that the workaround restores login functionality while monitoring tenant recovery. The incident is resolved on the status page. **09/12/2025** **15:08** – Feature flag re-enabled after system recovery; frontend services reconciled successfully. Incident fully resolved. ### Future consideration\(s\) * Improve validation of service principal credentials to prevent collector installation failures. * Enhance monitoring and alerting around login and redirect workflows to detect tenant query failures earlier. * Review and refine feature flag rollout procedures to minimize dependency risks and ensure optimal deployment. * Continue improving how customer data is distributed across clusters so that queries run more efficiently and reliably, even during high system load or maintenance events.