MindTouch incident

CXone Mpower Expert - Service Degradation: Search unavailable

MindTouch experienced a major incident on January 27, 2026 affecting Search and Geoblocking for Russia, lasting 2h 36m. The incident has been resolved; the full update timeline is below.

Started: Jan 27, 2026, 03:32 PM UTC
Resolved: Jan 27, 2026, 06:09 PM UTC
Duration: 2h 36m
Detected by Pingoru: Jan 27, 2026, 03:32 PM UTC

Affected components

SearchGeoblocking for Russia

Update timeline

investigating Jan 27, 2026, 03:32 PM UTC

CXone Mpower Expert Service Degradation: Search unavailable. The CXone Mpower Expert Engineering team is investigating reports of search unavailability.
investigating Jan 27, 2026, 04:03 PM UTC

CXone Mpower Expert Service Degradation: Search unavailable. The CXone Mpower Expert Engineering team is investigating reports of search unavailability.
investigating Jan 27, 2026, 04:27 PM UTC

CXone Mpower Expert Service Degradation: Search unavailable. The CXone Mpower Expert Engineering team is investigating reports of search unavailability.
monitoring Jan 27, 2026, 04:38 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Jan 27, 2026, 06:09 PM UTC

This incident has been resolved.
postmortem Mar 16, 2026, 11:05 PM UTC

**Impact Start Time \(UTC\):** 01/27/2026 03:16 PM UTC **Impact End Time \(UTC\):** 01/27/2026 04:20 PM UTC ‌ **Incident Summary:** Updated on 02/02/2026 - On 1/27/2026, some NiCE CXone Mpower customers reported significant functional and performance degradation within the CXone Mpower Expert application. Affected users experienced pronounced latency, recurring Hyper Text Transfer Protocol \(HTTP\) 503 service unavailable errors, and search related faults. These issues collectively resulted in Knowledge Base \(KB\) pages failing to load, along with intermittent authentication issues. The impact stemmed from a severe socket exhaustion between the Application Programming Interface \(API\) service and the proxy tier. The issue was fully mitigated after engineers executed a rolling update of the API backend services, which re-established socket connections and restored normal functionality. ‌ **Root Cause:** The incident was traced to socket exhaustion occurring between the API service and the proxy layer, driven by a combination of prolonged network timeouts and the application’s connection retry behavior. Intermittent network latency caused existing socket connections to remain in a hung or half-open state, preventing them from closing and being returned to the connection pool. When the application initiated retry attempts, it failed to reuse these open sockets and instead created additional outbound connections. This pattern progressively consumed all available socket resources allocated by the proxy tier. Once socket capacity was saturated, the API service could no longer establish new connections to downstream backend services, resulting in significant degradation of search operations. Under these constrained conditions, only a small subset of search requests successfully completed, while the majority failed with service level errors due to connection unavailability. Key contributing factors included insufficient observability into socket consumption and proxy connection thresholds, which delayed detection and mitigation. In addition, the proxy’s connection limits were not provisioned to accommodate the increased load generated by retry amplification during periods of network instability. ‌ **Corrective Actions:** _Detection_: Internal support teams received a customer reported issue and confirmed an ongoing functional and performance degradation within the CXone Mpower Expert application. _Remediation_: Engineers executed a rolling update of the API backend services, which re-established socket connections and restored normal functionality. Completed on 01/27/2026. _Prevention_: The Engineering team is implementing a monitoring enhancement and alerting to proactively detect abnormal connection usage and capacity saturation before customer impact occurs. Internal teams are implementing an increase in connection and service capacity to better handle peak traffic and retry scenarios. Additionally, improvements are being made to capacity planning to account for traffic spikes and cascading retry behavior. An update will be provided by EOD MT of 03/02/2026. ‌ **Incident Timeline \(UTC\):** 01/27/2026 03:16 PM \(UTC\) - First customer case opened, and Tech Support \(TS\) engineers began the troubleshooting investigation. 01/27/2026 03:41 PM \(UTC\) - TS engineers notified the Network Operations Center \(NOC\) engineers about the reported customer impact; a major incident was proposed and confirmed. 01/27/2026 04:20 PM \(UTC\) - Impact was resolved after engineers restarted services and internal tests were successful. Impact and major incident resolved.