Squiz incident

Major Incident - Funnelback DXP UK

Major Resolved View vendor source →

Squiz experienced a major incident on June 26, 2024 affecting Squiz Funnelback Hosted Instances, lasting 30m. The incident has been resolved; the full update timeline is below.

Started
Jun 26, 2024, 10:59 AM UTC
Resolved
Jun 26, 2024, 11:29 AM UTC
Duration
30m
Detected by Pingoru
Jun 26, 2024, 10:59 AM UTC

Affected components

Squiz Funnelback Hosted Instances

Update timeline

  1. investigating Jun 26, 2024, 10:59 AM UTC

    Squiz monitoring has detected a degradation of service impacting some Funnelback DXP customers in the UK only. Some customers are experiencing slow response times and/or timeouts. We are working to find the root cause of this issue currently A further update will be provided via https://status.squiz.cloud in 15 minutes, or earlier if the situation or information changes.

  2. monitoring Jun 26, 2024, 11:08 AM UTC

    We have identified the root cause of this issue and have implemented a fix. A further update will be provided via https://status.squiz.cloud in 15 minutes, or earlier if the situation or information changes.

  3. resolved Jun 26, 2024, 11:29 AM UTC

    We are pleased to confirm that the previously reported issue affecting the performance of our Funnelback DXP system has been successfully resolved. Our team closely monitored the situation, and were able to apply a fix for the issue, which led to significant improvements in performance. We will continue to keep a watchful eye on the system to ensure optimal performance and stability. We appreciate your patience and understanding during this time and apologise for any inconvenience caused. A post mortem will be made available on https://status.squiz.cloud/ in the coming days.

  4. postmortem Jul 01, 2024, 08:57 AM UTC

    ### Summary Squiz identified operational issues with the query processing resources in the UK DXP, leading to queuing and slower response times on searches being processed in the UK region. ### Customer impact A subset of UK Customers may have experienced delays in search results and timeouts when attempting to utilise the search function. ### Issue and Resolution Squiz engineers identified an increase in query volumes on the processing compute layer. Whilst clients have independent query processing capabilities, the overall compute layer powering this has a finite ceiling. The increased traffic was identified as a potential DDOS masquerading as valid search traffic. At no stage was there any breach of our systems. Once this was identified, Squiz Cloud Engineering was able to pinpoint the pattern of traffic and our web application firewall \(WAF\) was reconfigured to stop the the negative affect on our compute layer. This had an immediate impact on search performance and resulted in restoration of service. The degraded service was restored at 11:08 GMT. ### Mitigation In light of this incident, we are reviewing our alerting thresholds to detect and prevent these issues sooner.