Healthie incident

Slow response times due to AWS Incident

Minor Resolved View vendor source →

Healthie experienced a minor incident on October 20, 2025 affecting Healthie (Production), lasting 7h 4m. The incident has been resolved; the full update timeline is below.

Started
Oct 20, 2025, 01:10 PM UTC
Resolved
Oct 20, 2025, 08:14 PM UTC
Duration
7h 4m
Detected by Pingoru
Oct 20, 2025, 01:10 PM UTC

Affected components

Healthie (Production)

Update timeline

  1. investigating Oct 20, 2025, 01:10 PM UTC

    AWS continues to have issues with EC2 instances which is preventing us from automatically scaling to our normal server capacity (https://health.aws.amazon.com/). This is leading to slower than normal response times. Our team is monitoring and working with our host to scale up to the needed capacity.

  2. investigating Oct 20, 2025, 01:26 PM UTC

    We've confirmed we're seeing slower response times. We continue to work to provision the needed capacity. We have paused sending webhooks to help lessen server traffic as we work to scale capacity. All will be sent once we're able to successfully scale.

  3. investigating Oct 20, 2025, 03:10 PM UTC

    AWS continues to experience severe networking issues (as they are updating at https://health.aws.amazon.com/health/status). We continue to monitor and re-attempt to provision additional capacity.

  4. investigating Oct 20, 2025, 03:49 PM UTC

    AWS has "narrowed down the source of the network connectivity issues" and identified the root cause. They are actively working on mitigations but are still "throttling requests for new EC2 instances" which blocks us from provisioning needed capacity. We continue to monitor the situation closely.

  5. investigating Oct 20, 2025, 04:17 PM UTC

    AWS "have identified and are applying next steps to mitigate throttling of new EC2 instance launches.". This is what needs to be restored for Healthie's response times to return to normal, so this is a positive step. We continue to monitor.

  6. investigating Oct 20, 2025, 05:05 PM UTC

    AWS is "in the process of validating a fix and will deploy to the first AZ as soon as they have confidence they can do so safely."

  7. investigating Oct 20, 2025, 05:44 PM UTC

    AWS has shared a positive update- "the internal subsystems of EC2 are now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region. We are applying mitigations to the remaining AZs at which point we expect launch errors and network connectivity issues to subside.". Once we are able to launch new EC2 instances, we would expect response times to return to normal and this incident to be resolved. We're continuing to monitor.

  8. investigating Oct 20, 2025, 06:27 PM UTC

    We are seeing some (smaller) successes amongst failing scaling attempts and have been able to add more capacity which should help with response times. We're still far below normal response times and capacity and continue to work to get everything back to normal here.

  9. investigating Oct 20, 2025, 07:18 PM UTC

    We have been able to scale up and beyond our typical mid-day capacity. Response times should be back to normal, and background processes (specifically webhooks) have begun to catch up. We continue to monitor.

  10. resolved Oct 20, 2025, 08:14 PM UTC

    Webhooks are fully caught up, and response times are back to normal. This incident is resolved.