Proxyclick experienced a critical incident on March 28, 2023 affecting Dashboard and iPad app and 1 more component, lasting 2h 15m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 28, 2023, 10:53 AM UTC
We are currently investigating this issue that no users are able to use Proxyclick
- resolved Mar 28, 2023, 01:08 PM UTC
This incident has been resolved, post-mortem to come
- postmortem Mar 30, 2023, 01:41 PM UTC
**Proxyclick by Eptura Detailed Root Cause Analysis \(RCA\) – S1 Event 2023-03-28** _On March 28, 2023, Proxyclick monitoring detected a degradation in service, which grew into a full API outage. Engineering and DevOps teams identified the issue as related to an update earlier in the day and issued a hotfix, restoring service._ **Type of Event:** Service Disruption **Services Impacted:** Proxyclick Web Application **Remediation**: DevOps and Engineering created and deployed a hotfix, restoring normal operation. **Timeline of Events:** **09:00 UTC –** Scheduled platform update deployed **09:25 UTC –** Automated monitoring detects S2 performance degradation and alerts Engineering and DevOps teams **09:31 UTC –** Engineering and DevOps confirm the monitoring alert and begin incident management **09:34 UTC –** First customer reports received by Support team **09:49 UTC –** Incident impact reclassified as S1 **10:53 UTC –** Status Page updated to track incident **11:30 UTC –** Engineering and DevOps teams identify the true cause of the incident and begin remediation **11:39 UTC** **–** Engineering and DevOps teams complete deployment of hotfix, services begin return to normal operations **13:08 UTC –** All impacts to service are mitigated and the incident is closed **Total Duration:** 4 hours, 8 minutes **Groups Involved in the Event:** Engineering DevOps Support **Root Cause Analysis:** An update to the authentication logic for accessing the Proxyclick Web Application and APIs was mistakenly marked as deployed to production in a previous release but was included for the first time in the scheduled deployment on March 28, 2023. While this update worked without issue in all pre-production environments, a code defect was present which was unable to handle the higher load of the production environment. The investigation into the cause of the outage was delayed by the incorrect deployment record, as the team initially excluded the authentication components of the application. Significant time was spent reviewing incorrect logs and service components as a result. Once this was corrected the teams found the underlying issue, then created and deployed a hotfix patch to bring the outage to a close. **Preventative Action and Analysis:** Engineering and QA teams will implement better load-scaling into pre-production testing to reduce the potential for similar issues in future. Engineering and Product teams will review the deployment process to improve accuracy of release records. Incident management processes will be updated to ensure all logging sources are checked earlier in the investigation.