Healthise incident

Performance Issues

Healthise experienced a minor incident on August 29, 2022 affecting Coach and Communicate and 1 more component, lasting 3h 32m. The incident has been resolved; the full update timeline is below.

Started: Aug 29, 2022, 10:54 PM UTC
Resolved: Aug 30, 2022, 02:27 AM UTC
Duration: 3h 32m
Detected by Pingoru: Aug 29, 2022, 10:54 PM UTC

Affected components

CoachCommunicateEMR Modules

Update timeline

monitoring Aug 29, 2022, 10:54 PM UTC

Healthwise-hosted solutions are experiencing performance issues. Our Network Administrators and Engineers are working to fix the problem. We will post updates as we learn more.
monitoring Aug 29, 2022, 11:14 PM UTC

We are continuing to monitor for any further issues.
resolved Aug 30, 2022, 02:27 AM UTC

All performance issues have been resolved. We will post a root cause analysis once we have completed our full investigation. If the investigation has not been completed within 1 week we will post an interim RCA with the information that we currently have available.
postmortem Sep 15, 2022, 02:59 PM UTC

# **Introduction** The purpose of this Root Cause Analysis \(RCA\) is to determine the causes that contributed to the performance degradation of the Healthwise Coach, Communicate, Consumer API, and EMR Module applications on August 29, 2022. # **Event Description** At 3:30 PM MST on Monday, August 29th, 2022, Healthwise administrators were alerted to degraded performance for Healthwise hosted applications. Healthwise identified that database index builds were slowing down response times. Healthwise stopped the builds and increased computer resources so backed up processes could be completed. Once the servers were stable, the indexes were rebuilt using the additional resources. The processes completed at 8:12 PM MST. The total time of degradation was 4 hours and 42 minutes. # **Findings and Root Cause** Based on the investigation conducted, the team determined the following findings regarding this event: Index builds were started to reduce space and improve efficiency. Testing indicated the builds would allow processes to successfully complete. However, the load estimates used for testing did not account for product load during peak product usage. The additional load resulted in timeout errors that caused processes to back up which degraded application performance. # **Corrective Action** Healthwise administrators allocated more resources to the server and stopped the index builds until the server was stable. Using the additional resources, the index builds were restarted and completed with minimal disruption.