MindTouch incident

CXone Expert - Service Degradation

MindTouch experienced a minor incident on May 11, 2026 affecting Application (General Service) and Search and 1 more component, lasting 48m. The incident has been resolved; the full update timeline is below.

Started: May 11, 2026, 06:31 AM UTC
Resolved: May 11, 2026, 07:19 AM UTC
Duration: 48m
Detected by Pingoru: May 11, 2026, 06:31 AM UTC

Affected components

Application (General Service)SearchIn-Product Contextual HelpEmail ServicesMindTouch Success CenterAnalyticsGeoblocking for Russia

Update timeline

investigating May 11, 2026, 06:31 AM UTC

CXone Expert Service Degradation. The CXone Expert Engineering team is investigating reports of Service Degradation.
investigating May 11, 2026, 06:53 AM UTC

CXone Expert Service Degradation. The CXone Expert Engineering team is investigating reports of Service Degradation.
monitoring May 11, 2026, 07:05 AM UTC

A fix has been implemented and we are monitoring the results.
resolved May 11, 2026, 07:19 AM UTC

This incident has been resolved.
postmortem May 27, 2026, 03:53 PM UTC

**Major Incident# 02795089** **Impact Start Time \(UTC\) 05/11/2026 06:02 AM UTC** **Impact End Time \(UTC\) 05/11/2026 06:25 AM UTC** ### Incident Summary Updated on 05/22/2026 - On 05/11/2026, a NiCE CXone Mpower customer experienced slowness when accessing sites, while others were unable to access the platform entirely with a “504 Gateway Timeout” error within the CXone Mpower Expert knowledge portal. The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. The impact was mitigated after restarting the affected pods, which restored platform stability ### Root Cause The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. Under elevated traffic conditions, including automated crawler activity, some requests followed less optimized processing paths, increasing system load. This was further amplified by legacy or complex page content requiring more intensive processing. While scaling actions helped restore capacity, they also introduced temporary overhead that contributed to intermittent performance degradation. Additionally, although autoscaling functioned as designed, it reached its limits and was insufficient to address constraints related to per-pod Central Processing Unit \(CPU\) capacity. ### Corrective Actions **Detection:** Although built in alerting mechanisms were in place to detect this type of condition, alerts did not consistently reach the responsible teams as expected. In some cases, alerts were grouped or suppressed, delaying timely visibility of the issue. As a result, internal teams became aware of the impact primarily through a customer report of slowness when accessing sites within the CXone Mpower Expert knowledge portal. **Remediation:** The impact was mitigated after restarting the affected pods, which restored platform stability. Completed on 05/11/2026. **Prevention:** The Engineering team implemented interim mitigation measures to maintain system stability and ensure consistent performance while permanent improvements are finalized and deployed. Completed on 05/12/2026. Baseline system capacity was increased by raising the minimum number of pods and allocating higher CPU resources. This ensures sufficient resources are consistently available, reduces reliance on dynamic scaling, and improves overall system stability during periods of increased demand. Completed on 05/12/2026. The Engineering team will implement targeted software optimizations, including improvements to a specific endpoint that previously introduced cascading effects during scaling events. These enhancements are designed to reduce resource contention and improve system efficiency under high load conditions. The changes are currently undergoing validation and will be deployed following comprehensive testing. Completed on 05/21/2026. The Engineering team will enhance alert notification delivery to ensure alarms are reliably triggered and routed to the appropriate response teams as expected, enabling faster detection and more timely corrective action when issues arise. Completed on 05/22/2026. ### Incident Timeline \(UTC\) 05/11/2026 06:02 AM \(UTC\) - A customer case was opened, and Tech Support \(TS\) engineers began their initial validation and troubleshooting investigation. 05/11/2026 06:21 AM \(UTC\) - TS engineers notified the Network Operations Center \(NOC\) of the reported customer impact, and internal validation confirmed that site load times had degraded to approximately 15 seconds. 05/11/2026 06:22 AM \(UTC\) - Engineers identified a suspected cause and began remediation steps. 05/11/2026 06:25 AM \(UTC\) - The impact was resolved after engineers restarted the affected pods and validated recovery through internal testing, marking the end of customer impact. 05/11/2026 07:41 AM \(UTC\) - After extended validation and monitoring confirmed that the platform had stabilized, a major incident was formally declared to document the service impact. Following confirmation of sustained stability and no further customer impact, the incident was subsequently marked as resolved.