MindTouch incident

Service Degradation

Notice Resolved View vendor source →

MindTouch experienced a notice incident on October 10, 2024 affecting Application (General Service) and Search and 1 more component, lasting 6h 14m. The incident has been resolved; the full update timeline is below.

Started
Oct 10, 2024, 03:15 PM UTC
Resolved
Oct 10, 2024, 09:29 PM UTC
Duration
6h 14m
Detected by Pingoru
Oct 10, 2024, 03:15 PM UTC

Affected components

Application (General Service)SearchIn-Product Contextual HelpEmail ServicesMindTouch Success CenterAnalytics

Update timeline

  1. investigating Oct 10, 2024, 03:15 PM UTC

    Service Degradation: The MindTouch Engineering team is investigating reports of site unavailability.

  2. monitoring Oct 10, 2024, 05:46 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Oct 10, 2024, 09:29 PM UTC

    This incident has been resolved.

  4. postmortem Oct 18, 2024, 06:23 PM UTC

    ## **Summary** Updated on 10/21/2024 - On 10/10/2024, a NICE CXone customer reported that their knowledge portal sites intermittently failed to load within the CXone Expert knowledge platform. The impact stemmed from an incorrect queue service access policy setting in the system that was caused by a syntax error in the templating language component. The issue was resolved when engineers corrected the queue service access policy setting, restoring services to normal operation. ## **Root Cause** The impact stemmed from an incorrect queue service access policy setting in the system that was caused by a syntax error in the templating language component. This error resulted in a failure to render the necessary template, causing the policy setting to revert to its default settings. Engineers found no recent changes in the system and there were no logs or audit trails that could explain what triggered the syntax error. They also raised a ticket with our cloud service provider \(CSP\) to check their platform but found no evidence to trace the change of the policy setting. Despite the incident, the system has sufficient guard rails and alarm mechanisms in place to detect this type of error for immediate correction. ## **Corrective Actions** **Detection** * Internal support teams detected a potentially customer-impacting issue through proactive alarm and monitoring mechanisms, which was later confirmed by a customer report about their knowledge portal sites intermittently failing to load within the CXone Expert knowledge platform. **Remediation** * The issue was resolved when engineers corrected the impacted policy setting, restoring services to normal operation. Completed on 10/10/2024. **Prevention** * Engineers manually created a queue to add the policy into the infrastructure as code, ensuring that all functionalities were restored to their expected operations. Completed on 10/10/2024. * Engineers audited all access policies in the system to ensure they were correctly configured, preventing similar impacts in the future. Completed on 10/10/2024. ## **External Timeline** 10/10/2024 03:15 PM \(UTC\) - Internal support engineers detected alarms and observed spikes in system errors. Engineers began the troubleshooting investigation and posted a service degradation on the status page portal. 10/10/2024 03:44 PM \(UTC\) - Engineers notified the Network Operations Center \(NOC\) engineers about the detected alarms; a proactive major incident was proposed and confirmed. 10/10/2024 03:51 PM \(UTC\) - The customer case was opened, and Tech Support \(TS\) engineers began the troubleshooting investigation. This was later confirmed related to the service degradation issue. 10/10/2024 05:46 PM \(UTC\) - Engineers identified a suspected cause and began the remediation efforts. They had to conduct various test validations and system monitoring to ensure full restoration. 10/10/2024 09:29 PM \(UTC\) - Engineers concluded that the issue has been resolved. The impact was remediated after they corrected the policy settings, and successful internal tests confirmed the resolution. The major incident was marked as resolved.