MindTouch incident

CXone Knowledge Management – Monitoring complete. Status = All Services Running Normally

Major Resolved View vendor source →

MindTouch experienced a major incident on May 7, 2026 affecting Application (General Service) and Search and 1 more component, lasting 6h 29m. The incident has been resolved; the full update timeline is below.

Started
May 07, 2026, 11:52 AM UTC
Resolved
May 07, 2026, 06:21 PM UTC
Duration
6h 29m
Detected by Pingoru
May 07, 2026, 11:52 AM UTC

Affected components

Application (General Service)SearchIn-Product Contextual HelpEmail ServicesMindTouch Success CenterAnalyticsGeoblocking for Russia

Update timeline

  1. investigating May 07, 2026, 11:52 AM UTC

    CXone Mpower Expert Service Degradation. The CXone Mpower Expert Engineering team is investigating reports of Service Degradation.

  2. investigating May 07, 2026, 12:08 PM UTC

    We are continuing to investigate this issue.

  3. investigating May 07, 2026, 12:23 PM UTC

    We are continuing to investigate this issue.

  4. investigating May 07, 2026, 12:40 PM UTC

    We are continuing to investigate this issue.

  5. investigating May 07, 2026, 12:54 PM UTC

    We are continuing to investigate this issue.

  6. investigating May 07, 2026, 01:18 PM UTC

    We are continuing to investigate this issue.

  7. investigating May 07, 2026, 01:42 PM UTC

    We are continuing to investigate this issue.

  8. monitoring May 07, 2026, 01:52 PM UTC

    A fix has been implemented and we are monitoring the results.

  9. investigating May 07, 2026, 04:21 PM UTC

    CXone Mpower Expert Service Degradation: Sites Service Degradation. The CXone Mpower Expert Engineering team is investigating reports of Site Service Degradation.

  10. identified May 07, 2026, 05:40 PM UTC

    CXone Knowledge Management Service Degradation: Sites unavailable. The issue has been identified and a fix is being worked on for deployment.

  11. monitoring May 07, 2026, 05:56 PM UTC

    CXone Knowledge Management - Fix Deployed - All Services Running Normally. The CXone Knowledge Management Engineering team has deployed a fix and all services are running normally. We are currently monitoring sites for deployment stability.

  12. resolved May 07, 2026, 06:21 PM UTC

    CXone Knowledge Management - Service Disruption Resolved - All Services Running Normally. The NiCE Knowledge Management Engineering team has deployed a fix and monitored the deployment to make sure sites are stable. The issue is now resolved at this time. Event duration - 5hr 48m

  13. postmortem May 14, 2026, 03:01 PM UTC

    **Major Incident# 02793015** **Impact Start Time \(UTC\) 05/07/2026 11:43 AM UTC** **Impact End Time \(UTC\) 05/07/2026 02:07 PM UTC** ### Incident Summary On 05/07/2026, some NiCE CXone Mpower customers experienced slowness when accessing sites, while others were unable to access the platform entirely with a “504 Gateway Timeout” error within the CXone Mpower Expert knowledge portal. The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. The impact was resolved after scaling up pod resources and restarting the proxy pods, which restored platform stability. ### Root Cause The recurring service degradation incidents were caused by increased traffic volumes combined with performance limitations in certain backend processes, primarily impacting the US regional platform due to higher demand. Under elevated traffic conditions, including automated crawler activity, some requests followed less optimized processing paths, increasing system load. This was further amplified by legacy or complex page content requiring more intensive processing. While scaling actions helped restore capacity, they also introduced temporary overhead that contributed to intermittent performance degradation. Additionally, although autoscaling functioned as designed, it reached its limits and was insufficient to address constraints related to per-pod Central Processing Unit \(CPU\) capacity. Overall, evolving traffic patterns exposed underlying performance limitations, highlighting the need for targeted code optimizations and increased per service capacity. ### Corrective Actions **Detection:** Although built in alerting mechanisms were in place to detect this type of condition, alerts did not consistently reach the responsible teams as expected. In some cases, alerts were grouped or suppressed, delaying timely visibility of the issue. As a result, internal teams became aware Corrective Actions of the impact primarily through customer reports of slowness when accessing sites within the CXone Mpower Expert knowledge portal. **Remediation:** The impact was resolved after scaling up pod resources and restarting the proxy pods, which restored platform stability. Completed on 05/07/2026. **Prevention:** The Engineering team enhanced traffic filtering rules at the Web Application Firewall \(WAF\) layer to identify and block a significant portion of automated bot traffic contributing to elevated system load. These actions reduced unnecessary requests and improved overall platform stability. Completed on 05/07/2026. The Engineering team implemented interim mitigation measures to maintain system stability and ensure consistent performance while permanent improvements are finalized and deployed. Completed on 05/12/2026. Baseline system capacity was increased by raising the minimum number of pods and allocating higher CPU resources. This ensures sufficient resources are consistently available, reduces reliance on dynamic scaling, and improves overall system stability during periods of increased demand. Completed on 05/12/2026. The Engineering team will implement targeted software optimizations, including improvements to a specific endpoint that previously introduced cascading effects during scaling events. These enhancements are designed to reduce resource contention and improve system efficiency under high load conditions. The changes are currently undergoing validation and will be deployed following comprehensive testing. An update will be provided by End of Day \(EOD\) MT on 05/22/2026. The Engineering team will enhance alert notification delivery to ensure alarms are reliably triggered and routed to the appropriate response teams as expected, enabling faster detection and more timely corrective action when issues arise. An update will be provided by EOD MT on 05/22/2026. ### Incident Timeline \(UTC\) 05/07/2026 11:43 AM \(UTC\) - The first customer case opened, and Tech Support \(TS\) engineers began the troubleshooting investigation. 05/07/2026 11:44 AM \(UTC\) - TS engineers notified the Network Operations Center \(NOC\) engineers about the reported customer impact; a major incident was proposed and confirmed. 05/07/2026 12:09 PM \(UTC\) - Engineers identified a suspected cause and increased the resources of the web pods to improve system performance. 05/07/2026 12:18 PM \(UTC\) - Engineers also scaled up resources for the Application Programming Interface \(API\) pods to further stabilize performance. 05/07/2026 01:30 PM \(UTC\) - The platform continued to catch-up and engineers were already seeing improvements in system performance. 05/07/2026 01:52 PM \(UTC\) - Engineers restarted proxy pods, resulting in continued performance improvements while monitoring system stability. 05/07/2026 02:00 PM \(UTC\) - Platform performance returned to normal levels, with continued validation and monitoring underway. 05/07/2026 02:07 PM \(UTC\) - The platform stabilized fully. The impact was resolved following resource scaling, and after successful validation, the major incident was marked as resolved.