Addigy incident

Elevated Latency on Addigy Cloud Interface

Critical Resolved View vendor source →

Addigy experienced a critical incident on October 4, 2024 affecting Addigy Cloud Interface, lasting 4h 37m. The incident has been resolved; the full update timeline is below.

Started
Oct 04, 2024, 05:10 PM UTC
Resolved
Oct 04, 2024, 09:47 PM UTC
Duration
4h 37m
Detected by Pingoru
Oct 04, 2024, 05:10 PM UTC

Affected components

Addigy Cloud Interface

Update timeline

  1. investigating Oct 04, 2024, 05:10 PM UTC

    We are currently investigating elevated latency on Addigy Cloud Interface, certain components may be slow to load.

  2. investigating Oct 04, 2024, 06:13 PM UTC

    We are actively investigating elevated latency on Addigy Cloud Interface, certain components may be slow to load. We will continue to post updates here as our investigation progresses.

  3. monitoring Oct 04, 2024, 06:33 PM UTC

    A fix has been implemented to reduce the elevated latency on Addigy Cloud Interface. The interface is currently operational and we are currently monitoring and reviewing further.

  4. resolved Oct 04, 2024, 09:47 PM UTC

    This incident has been resolved.

  5. postmortem Nov 13, 2024, 05:03 PM UTC

    Incident Overview: The Web Application User Interface \([https://app.addigy.com](https://app.addigy.com)\) experienced extreme latency due to a spike in long-running queries, depleting resources and causing system slowness. This was noticeable when going to our website and seeing the page take 1-2 minutes to load. No notable impact was shown across running device actions. ‌ Root Cause: The incident was triggered by long-running queries affecting multiple data sources across different systems. This caused resource contention, leading to performance degradation. The precise cause of the query spike is being further evaluated for additional improvements. ‌ Detection and Response: * Alerts were triggered prompting the Operations and Engineering teams to immediately begin investigating. * Remediation efforts included adding additional optimizations to data sources, optimizing resource performance, and improvements to offending queries. * Combining all above mentioned remediation efforts, the system recovered and performance stabilized. ‌ Remediation Actions: 1. Implemented a process to include new remediation efforts into existing process action steps. 2. Implemented a new process to prevent database resource contention.