Healthise incident

Videos were unavailable to core applications.

Minor Resolved View vendor source →

Healthise experienced a minor incident on February 21, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Feb 21, 2024, 06:15 PM UTC
Resolved
Feb 21, 2024, 04:00 AM UTC
Duration
Detected by Pingoru
Feb 21, 2024, 06:15 PM UTC

Update timeline

  1. resolved Feb 21, 2024, 06:15 PM UTC

    All performance issues have been resolved. We will post a root cause analysis once we have completed our full investigation. If the investigation has not been completed within 1 week, we will post an interim RCA with the information that we currently have available.

  2. postmortem Feb 29, 2024, 04:43 PM UTC

    ## Event Description At 8:24 PM MST, on Tuesday, February 20, 2024, Healthwise administrators noticed that the Integration API was experiencing degraded performance. Healthwise administrators rolled back to the previously known stable version at 9:04 and product performance improved. Monitors reported stable performance at 9:17 PM MST. The total time of the incident was 53 minutes of degraded performance. At 9:39 PM MST, on Tuesday, February 20, 2024, Healthwise administrators noticed that the Media Service was experiencing degraded performance and rolled back the service to a previous version. However, testing after the roll back did not detect that videos were not playing. At 7:33 AM MST, on Wednesday, February 21, 2024, Healthwise administrators began another rollback of the Media Service. At 7:53 AM MST, the roll back was complete and requests for videos started playing the videos. The total time of the incident was 10 hours and 14 minutes. ## Findings and Root Cause Based on the investigation conducted, the team determined the following findings regarding these events: The Tuesday night deployment of Integration API used a faulty resource that had incomplete or corrupted files that caused errors and led to product instability. After the rollback, the Integration API started using a resource that wasn’t faulty and performance improved. The Tuesday night deployment of the Media Service used a faulty resource that had incomplete or corrupted files that caused errors and led to product instability. After the roll back, the media service used a resource that wasn’t faulty, but a bug prevented the videos from playing. A false positive response was returned when a video request was made so no errors were logged to alert on-call support. Videos started playing when the media service was rolled back to a version that didn’t include the bug. ## Corrective Action Performance for the Integration API was restored when it was rolled back to the previously known stable version. Healthwise teams are monitoring the faulty resource to ensure it doesn’t have incomplete or corrupted files. Performance for the Media Service was restored when it was rolled back to a previously known stable version. Healthwise teams are monitoring the faulty resource to ensure it doesn’t have incomplete or corrupted files and are reviewing their deployment and roll back procedures to ensure testing validates that videos are playing.