Healthise incident

Performance Issues

Critical Resolved View vendor source →

Healthise experienced a critical incident on October 11, 2023 affecting Coach and Communicate and 1 more component, lasting 1h 51m. The incident has been resolved; the full update timeline is below.

Started
Oct 11, 2023, 11:01 AM UTC
Resolved
Oct 11, 2023, 12:52 PM UTC
Duration
1h 51m
Detected by Pingoru
Oct 11, 2023, 11:01 AM UTC

Affected components

CoachCommunicateContent BrowserCustom Content Manager (CCM)EMR ModulesKnowledgebase

Update timeline

  1. investigating Oct 11, 2023, 11:01 AM UTC

    Healthwise-hosted solutions are experiencing performance issues. Our Network Administrators and Engineers are working to fix the problem. We will post updates as we learn more.

  2. investigating Oct 11, 2023, 12:26 PM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 11, 2023, 12:28 PM UTC

    We are continuing to investigate this issue.

  4. monitoring Oct 11, 2023, 12:43 PM UTC

    A fix has been implemented and we are monitoring the results.

  5. resolved Oct 11, 2023, 12:52 PM UTC

    All performance issues have been resolved. We will post a root cause analysis once we have completed our full investigation. If the investigation has not been completed within 1 week we will post an interim RCA with the information that we currently have available.

  6. postmortem Oct 17, 2023, 03:56 PM UTC

    ## Introduction The purpose of this Root Cause Analysis \(RCA\) is to determine the causes that contributed to the missing content for the Healthwise-hosted solution on October 11th, 2023. ## Event Description At 3:23 AM MST, on Wednesday, October 11, 2023, Healthwise was alerted to an issue with a service in its Healthwise-hosted solutions. Healthwise administrators began investigation and found all applications were working as expected. Further investigation found that some content was missing from an index used to access content. At 4:47 AM MST, administrators escalated support to content engineering. Troubleshooting by the response team led to a disruption of service for Healthwise applications starting at 6:06 AM MST Healthwise engineers completed a successful rebuild of the index at 6:40 AM MST and service was restored to all applications and services at 6:47 AM MST. The total time of the incident was 3 hours and 24 minutes. ## Findings and Root Cause Based on the investigation conducted, the team determined the following findings regarding this event: Healthwise transitioned to a scalable service for managing its content index to improve performance. The index was built by running multiple requests to copy content from an existing index. A bug in the copy process didn’t copy every record from the existing index to the new index and the missing content was unavailable when the applications began using it. Access to all content was restored when Healthwise engineers rebuilt the index using the original indexing process. ## Corrective Action Access to all content was restored when Healthwise engineers rebuilt the index using the original indexing process. We are actively working to improve our testing so that missing content can be identified before the index is available in the production environment, improve the time it takes to rollback to a previous index, and ensure that an updated index is first used during the Healthwise maintenance window.