MindTouch incident

CXone Knowledge Management - Fix Deployed. Status = initiating monitoring

Minor Resolved View vendor source →

MindTouch experienced a minor incident on May 11, 2026 affecting Application (General Service) and Search and 1 more component, lasting 1d 6h. The incident has been resolved; the full update timeline is below.

Started
May 11, 2026, 08:38 AM UTC
Resolved
May 12, 2026, 02:50 PM UTC
Duration
1d 6h
Detected by Pingoru
May 11, 2026, 08:38 AM UTC

Affected components

Application (General Service)SearchIn-Product Contextual HelpEmail ServicesMindTouch Success CenterAnalyticsGeoblocking for Russia

Update timeline

  1. investigating May 11, 2026, 08:38 AM UTC

    CXone Expert Service Degradation. The CXone Expert Engineering team is investigating reports of site unavailability.

  2. investigating May 11, 2026, 08:52 AM UTC

    CXone Expert Service Degradation. The CXone Expert Engineering team is investigating reports of site unavailability.

  3. monitoring May 11, 2026, 09:25 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. identified May 11, 2026, 07:08 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  5. identified May 11, 2026, 07:47 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  6. identified May 11, 2026, 08:04 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  7. identified May 11, 2026, 08:24 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  8. identified May 11, 2026, 08:45 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  9. identified May 11, 2026, 09:19 PM UTC

    CXone Knowledge Management - Fix Deployed - All Services Running Normally. The CXone Knowledge Management Engineering team has deployed a fix and all services are running normally. We are currently monitoring sites for deployment stability.

  10. resolved May 12, 2026, 02:50 PM UTC

    This incident has been resolved.

  11. postmortem May 27, 2026, 03:58 PM UTC

    **Major Incident# 02796435** **Impact Start Time \(UTC\) 05/12/2026 02:50 PM UTC** **Impact End Time \(UTC\) 05/12/2026 03:36 PM UTC** ### Incident Summary Updated on 05/22/2026 - On 05/12/2026, some NiCE CXone Knowledge customers experienced slowness when accessing sites, while others were unable to access the platform entirely with a “504 Gateway Timeout” error within the CXone Knowledge portal. The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. The impact was mitigated after restarting the affected pods, which restored platform stability ### Root Cause The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. Under elevated traffic conditions, including automated crawler activity, some requests followed less optimized processing paths, increasing system load. This was further amplified by legacy or complex page content requiring more intensive processing. While scaling actions helped restore capacity, they also introduced temporary overhead that contributed to intermittent performance degradation. Additionally, although autoscaling functioned as designed, it reached its limits and was insufficient to address constraints related to per-pod Central Processing Unit \(CPU\) capacity. ### Corrective Actions **Detection:** Internal support teams detected partial service failures through proactive alerting and monitoring. Subsequently, engineers received customer reports of slowness when accessing sites within the CXone Knowledge portal. **Remediation:** The impact was mitigated after restarting the affected pods, which restored platform stability. Completed on 05/12/2026. **Prevention:** The Engineering team implemented interim mitigation measures to maintain system stability and ensure consistent performance while permanent improvements are finalized and deployed. Completed on 05/12/2026. Baseline system capacity was increased by raising the minimum number of pods and allocating higher CPU resources. This ensures sufficient resources are consistently available, reduces reliance on dynamic scaling, and improves overall system stability during periods of increased demand. Completed on 05/12/2026. The Engineering team will implement targeted software optimizations, including improvements to a specific endpoint that previously introduced cascading effects during scaling events. These enhancements are designed to reduce resource contention and improve system efficiency under high load conditions. The changes are currently undergoing validation and will be deployed following comprehensive testing. Completed on 05/21/2026. The Engineering team will enhance alert notification delivery to ensure alarms are reliably triggered and routed to the appropriate response teams as expected, enabling faster detection and more timely corrective action when issues arise. Completed on 05/22/2026. ### Incident Timeline \(UTC\) 5/12/2026 02:50 PM \(UTC\) - Engineers identified partial service failures through proactive alerting and monitoring. A corresponding service disruption notification was promptly published on the Status Health page while engineers initiated their investigation and remediation efforts to prevent customer impact. 5/12/2026 03:09 PM \(UTC\) - The first customer case was opened, and Tech Support \(TS\) engineers began their initial validation and troubleshooting investigation. 5/12/2026 03:27 PM \(UTC\) - TS engineers notified the Network Operations Center \(NOC\) engineers about the reported customer impact; a major incident was proposed and confirmed. 5/12/2026 03:36 PM \(UTC\) - The impact was resolved after engineers restarted the affected pods and validated recovery through internal testing, marking the end of customer impact. Following confirmation of sustained stability and no further customer impact, the incident was subsequently marked as resolved.