MindTouch incident

CXOne MPower Expert - All services running normally

Notice Resolved View vendor source →

MindTouch experienced a notice incident on March 4, 2025 affecting Application (General Service) and Search and 1 more component, lasting 44m. The incident has been resolved; the full update timeline is below.

Started
Mar 04, 2025, 04:49 PM UTC
Resolved
Mar 04, 2025, 05:34 PM UTC
Duration
44m
Detected by Pingoru
Mar 04, 2025, 04:49 PM UTC

Affected components

Application (General Service)SearchIn-Product Contextual HelpEmail ServicesMindTouch Success CenterAnalytics

Update timeline

  1. investigating Mar 04, 2025, 04:02 PM UTC

    MindTouch Service Degradation: Sites unavailable. The MindTouch Engineering team is investigating reports of site unavailability.

  2. monitoring Mar 04, 2025, 04:30 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Mar 04, 2025, 04:49 PM UTC

    All services running normally

  4. monitoring Mar 04, 2025, 05:05 PM UTC

    All services running normally

  5. monitoring Mar 04, 2025, 05:20 PM UTC

    All services running normally

  6. resolved Mar 04, 2025, 05:34 PM UTC

    This incident has been resolved.

  7. postmortem Mar 14, 2025, 03:39 PM UTC

    **Impact Start Time** \(UTC\) 03/04/2025 03:40 PM UTC **Impact End Time** \(UTC\) 003/04/2025 04:05 PM UTC ### Incident Summary On 3/4/2025, some CXone Mpower customers reported inability to access CXone Mpower Expert application sites, where users received error messages intermittently when attempting to login or access the affected sites. Internal teams observed potentially customer impacting issues via proactive monitoring systems, which was later confirmed through customer reports. The impact stemmed from an application capacity issue, which led to the impaired processing capabilities of the application. The impact was resolved when engineers increased the number of application nodes and restarted the application load balancer instances. ### Root Cause The root cause stemmed from an application capacity issue, which was caused by a failover activity performed by our Cloud Service Provider \(CSP\). This activity moved application instances to another host; however, some of the nodes did not return to a healthy state. This caused the load balancer instances to become unstable since it did not have enough instances to process incoming requests. ## Corrective Actions ### Detection: * Internal teams observed potentially customer impacting issues via proactive monitoring systems, which was later confirmed through customer reports of accessibility issues of the CXone Mpower Expert application.. ### Remediation: * Engineers increased the number of application nodes and restarted the application load balancer instances. Completed on 03/04/2025. ### Prevention: * The Engineering team created additional Auto-Scaling Groups \(ASGs\) to segregate workload and prevent negative impact to other workloads when similar issues occur. Completed on 03/14/2025. * Engineers are developing a custom health check, which will be used by the network load balancer to monitor and detect the status of the application load balancers. An update will be provided by End-of-Day \(EOD\) MT of 03/28/2025. ### Risk of Reoccurrence of Impact: Low ### Incident Timeline \(UTC\) 03/04/2025 03:52 PM \(UTC\) - First customer case opened, and Tech Support \(TS\) engineers began the troubleshooting investigation 03/04/2025 04:07 PM \(UTC\) - TS engineers notified the Network Operations Center \(NOC\) engineers about the reported customer impact; a major incident was proposed and confirmed 03/04/2025 04:24 PM \(UTC\) - Engineers identified a suspected cause and began remediation steps 03/04/2025 04:33 PM \(UTC\) - Impact was resolved after engineers identified the issue and implemented a fix and internal tests were successful. Impact and major incident resolved