postmortem Mar 19, 2026, 04:19 PM UTC
# Summary Between 2026‑03‑12 at 06:21 AM EDT and 2026‑03‑13 at 11:51 AM EDT, customers experienced a performance degradation in the Web Application, which could manifest as slowness or errors while navigating the product. During this period, managed devices intermittently appeared to switch between online and offline states, triggering status‑change notifications when configured, despite the devices’ actual connection status remaining unchanged. # Root Cause A code change introduced in the 9.25 release caused the service responsible for collecting and storing Patch History information to spawn multiple unexpected instances due to a dependency-logic flaw. This led the service to repeatedly trigger full patch‑history resynchronizations across all agents. The resulting surge in request volume gradually saturated the web services, causing progressive performance degradation and ultimately rendering some web service instances unavailable. The R&D team has deployed an Agent hotfix to resolve the issue and has temporarily disabled the patch error history feature in the Web Application. This feature will be re‑enabled, and all related data will be restored during the next release cycle. # Preventative Measures To reduce the likelihood and impact of similar incidents in the future, we are taking the following steps * **Hardening of Internal Systems and Dependencies:** The R&D Team will conduct a comprehensive audit of all dependency logic across the product and verify that injected dependencies consistently perform as expected under load and across various operational scenarios. * **Enhanced Release Management and Control:** The R&D Team will implement additional mandatory quality‑assurance measures to ensure that all shared components and injected dependencies are properly validated and scoped to their intended use‑case scenarios. These measures will help prevent unintended states by confirming that dependencies behave consistently, remain isolated to their defined contexts, and do not introduce side effects across the system. * **Enhanced Monitoring, Alerting, and Response:** The R&D Team will implement enhanced monitoring to identify unexpected database growth in real time across both QA and production environments. These metrics will be integrated with existing anomaly‑detection systems to enable faster detection and response to potentially undesirable states that could impact system stability or degrade user experience. * **Enhanced Incident Management and Response:** The teams will continue to maintain and enhance scenario‑specific playbooks to ensure that validated mitigation and resolution procedures are executed quickly and accurately. These improvements are intended to further reduce the time required to detect, diagnose, and resolve issues when they occur.