Healthise experienced a minor incident on November 1, 2022 affecting Coach and Communicate and 1 more component, lasting 2h 21m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 01, 2022, 03:46 PM UTC
Healthwise-hosted solutions are experiencing performance issues. Our Network Administrators and Engineers are working to fix the problem. We will post updates as we learn more.
- investigating Nov 01, 2022, 03:47 PM UTC
We are continuing to investigate this issue.
- resolved Nov 01, 2022, 06:07 PM UTC
All performance issues have been resolved. We will continue to monitor the situation to ensure product stability. We will post a root cause analysis once we have completed our full investigation. If the investigation has not been completed within 1 week we will post an interim RCA with the information that we currently have available.
- postmortem Nov 10, 2022, 11:04 PM UTC
## Introduction The purpose of this Root Cause Analysis \(RCA\) is to determine the causes that contributed to the performance issues of the Healthwise-hosted solutions on November 01, 2022. ## Event Description At 7:31 AM MST on Tuesday, November 1, 2022, Healthwise administrators were alerted to intermittent performance degradation to Healthwise-hosted applications. Healthwise found the search index was in a bad state due to heavy network traffic. At 10:26 AM MST Healthwise was able to restore service by reducing the requests and rebuilding the index. Total time of the incident was 2 hours and 55 minutes; however, degradation was intermittent during that time. ## Findings and Root Cause Based on the investigation conducted, the team determined the following findings regarding this event: The rate limiting solution directed a single session of heavy network traffic to a backend service that couldn’t handle the load. Infrastructure engineers were able to mitigate the initial service degradation, but increased network traffic caused additional incidents that prompted further investigation. This resulted in corrective action that isolated the disruptive traffic and stabilized the environment. ## Corrective Action Healthwise has adjusted its rate limiting solution to account for the unique traffic and has committed to making improvements to the backend systems to increase the ability to handle large increases in traffic.