CyberFOX incident

High Latency

CyberFOX experienced a minor incident on December 18, 2023 affecting Cloud Services and Password Boss Portal and 1 more component, lasting 7h 36m. The incident has been resolved; the full update timeline is below.

Started: Dec 18, 2023, 02:01 PM UTC
Resolved: Dec 18, 2023, 09:37 PM UTC
Duration: 7h 36m
Detected by Pingoru: Dec 18, 2023, 02:01 PM UTC

Affected components

Cloud ServicesPassword Boss PortalPassword Boss Partner PortalMobile AppsDesktop Clients

Update timeline

investigating Dec 18, 2023, 02:01 PM UTC

We are currently investigating an issue in the backend in regards to High Latency
investigating Dec 18, 2023, 03:29 PM UTC

We have confirmed that the current issue with high latency is with our primary database and we still are investigating further
monitoring Dec 18, 2023, 03:55 PM UTC

We have successfully identified and applied remediation measures to address the issue. The implemented fix has been deployed, and we are actively engaged in continuous monitoring to ensure the performance and stability.
monitoring Dec 18, 2023, 04:27 PM UTC

During monitoring the fix we noticed an issue with extreme inbound/outbound traffic, the DevOps team is currently working to correct the situation
monitoring Dec 18, 2023, 06:50 PM UTC

We've successfully resolved the unexpected high load issue from this morning, and all services are back to normal. Our team continues to monitor for stability. The incident remains open for precautionary monitoring and continued information to our customers. Thank you for your patience.
resolved Dec 18, 2023, 09:37 PM UTC

At 6:30 AM today, we encountered a surge in traffic that triggered a spike in system load. The main database reached its load limit, which caused the read replicas to fall behind. Despite our efforts to address the issue, a number of challenges ensued: - Read replicas constantly going out of sync as traffic surged. - During an attempted reboot, we experienced hitting the maximum connection limits. - Although our monitoring systems were saying everything was OK, the escalating load ultimately overloaded the primary database. Upon investigation, we identified a solution to streamline read/write operations so that the databases can stay in sync. We are taking immediate action to rectify these issues and fortify our systems against such incidents in the future. We appreciate your patience and understanding as we work to ensure the stability and reliability of our services.
postmortem Dec 18, 2023, 09:37 PM UTC

At 6:30 AM today, we encountered a surge in traffic that triggered a spike in system load. The main database reached its load limit, which caused the read replicas to fall behind. Despite our efforts to address the issue, a number of challenges ensued: * Read replicas constantly going out of sync as traffic surged. * During an attempted reboot, we experienced hitting the maximum connection limits. * Although our monitoring systems were saying everything was OK, the escalating load ultimately overloaded the primary database. Upon investigation, we identified a solution to streamline read/write operations so that the databases can stay in sync. We are taking immediate action to rectify these issues and fortify our systems against such incidents in the future. We appreciate your patience and understanding as we work to ensure the stability and reliability of our services.