Auvik incident
Service Degraded - Clients on the EU1 cluster using V2 alerting are not reviewing device alerts
Auvik experienced a minor incident on August 22, 2025 affecting eu1.my.auvik.com, lasting 1h 4m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 22, 2025, 09:32 AM UTC
Affected Services: Alerting V2 Cluster(s): EU1 Description: We are currently experiencing degraded services. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: Users may experience a delay of up to 12 hours for device-only related V2 alerts. Services monitoring and the UI are not impacted. Next Steps: We will update this information as more details become available. We appreciate your patience as we work to restore full functionality.
- investigating Aug 22, 2025, 10:09 AM UTC
Affected Services: Alerting V2 Cluster(s): EU1 Description: We are currently experiencing degraded services. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. Impact: Users are not currently receiving or clearing device-only related V2 alerts. Services: monitoring and the UI are not impacted. All legacy alerting is working. Next Steps: We will update this information as more details become available. We appreciate your patience as we work to restore full functionality.
- monitoring Aug 22, 2025, 10:26 AM UTC
Our team has implemented a fix for the disruption, and the service has returned to normal. We continue to monitor the situation to ensure stability and confirm that the service remains fully functional. Impact: Services should be operating normally; however, we continue monitoring for irregularities. If you are still experiencing issues, please do not hesitate to reach out to the support team and update your ticket or report any problems you haven't reported yet. Next Steps: We will provide a final update once the issue is resolved. We appreciate your patience as we work through this issue.
- resolved Aug 22, 2025, 10:36 AM UTC
The incident has been fully resolved. Regular service has been restored, and all systems are operating as expected. Impact: Users should no longer experience any issues related to this incident. If you are still experiencing issues, please do not hesitate to reach out to the support team and update your ticket or report any problems you haven't reported yet. Service has been fully restored. We apologize for the degradation in services. We thank you for your understanding. If you continue to experience issues, please don't hesitate to contact our support team. We will post an RCA after an internal investigation.
- postmortem Sep 08, 2025, 01:35 PM UTC
# Service Degraded - Clients on the EU1 cluster using V2 alerting are not reviewing device alerts. ## Root Cause Analysis ### Duration of the incident Discovered: Aug 21, 2025 23:47 - UTC Resolved: Aug 22, 2025 12:00 - UTC ### Cause A change to the alert-processing timing logic introduced a defect where time windows did not close properly. This prevented events from being processed promptly, causing alerts to queue up and delaying their delivery to the user interface. ### Effect Customers on the EU1 cluster using V2 alerting experienced delays in reviewing device alerts, with some alerts being delayed by up to 12 hours. ### Action taken _All times are in UTC_ **08/21/2025** **23:47** – Alert processing began lagging; backlog started building. **08/22/2025** **06:00** – Incident declared; engineers engaged to investigate. **12:00** – Adjustments made to processing pipeline; backlog cleared; all delayed alerts reprocessed; service restored. ### Future consideration\(s\) * Add monitoring for time-window stalls and backlog growth. * Expand testing to cover out-of-order and skewed event scenarios. * Strengthen rollback plans for all future alert-processing changes.