Healthise experienced a major incident on August 4, 2021 affecting Coach and Custom Content Manager (CCM) and 1 more component, lasting 2h 39m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 04, 2021, 02:27 PM UTC
Several Healthwise-hosted solutions are currently experiencing issues. Our team is working to fix the problem. We will post updates as we learn more.
- investigating Aug 04, 2021, 02:28 PM UTC
We are continuing to investigate this issue.
- investigating Aug 04, 2021, 02:46 PM UTC
We are continuing to investigate this issue.
- monitoring Aug 04, 2021, 04:02 PM UTC
degraded performance with search indexing
- resolved Aug 04, 2021, 05:07 PM UTC
The incident has been resolved and all services are now operational
- postmortem Aug 11, 2021, 02:08 PM UTC
The purpose of this Root Cause Analysis \(RCA\) is to determine the causes that contributed to the performance degradation and intermittent outages of the Healthwise Coach, EMR Module, Knowledgebase, Media Service API and Custom Content Manager applications on August 04, 2021. # Event Description Beginning at 07:49AM MST on Wednesday, August 04, 2021, Healthwise administrators received notifications from their monitoring systems that the Healthwise Coach, EMR Module, Knowledgebase, Media Service API and Custom Content Manager applications were experiencing degradation and latency significant enough to impact the client experience. Healthwise identified a resource issue on one of the web servers hosting the Healthwise applications. Healthwise removed the affected server from the load balancer and re-ran the previous night's deployment process on the affected server. After these steps were completed, the server was validated and restored to the load balancer. Degradation and latency issues were resolved at 09:48 AM MST. The approximate length of the degradation was 119 minutes. # Findings and Root Cause Based on the investigation conducted, the team determined the following findings regarding this event: A deployment process failed to update one of the web server nodes and left the affected server unresponsive. This caused application degradation until the server was removed from the load balancer and the server's configuration was updated. # Corrective Action Healthwise is reviewing its automated deployment processes and will implement additional checks in order to ensure deployments complete as expected.