DrChrono incident

Disruption in Service

Critical Resolved View vendor source →

DrChrono experienced a critical incident on July 16, 2025 affecting drchrono.com, lasting 1h 55m. The incident has been resolved; the full update timeline is below.

Started
Jul 16, 2025, 03:24 PM UTC
Resolved
Jul 16, 2025, 05:20 PM UTC
Duration
1h 55m
Detected by Pingoru
Jul 16, 2025, 03:24 PM UTC

Affected components

drchrono.com

Update timeline

  1. identified Jul 16, 2025, 03:24 PM UTC

    Our team has identified reports of system slowness across various areas of the DrChrono application. We are working to resolve this issue.

  2. identified Jul 16, 2025, 03:47 PM UTC

    We are continuing to work on a fix for this issue.

  3. identified Jul 16, 2025, 04:17 PM UTC

    We are continuing to work on a fix for this issue. The next update will be in 20 minutes.

  4. identified Jul 16, 2025, 04:44 PM UTC

    We are continuing to work on a fix for this issue. The next update will be in 20 minutes.

  5. monitoring Jul 16, 2025, 05:03 PM UTC

    A fix has been implemented and we are monitoring the results.

  6. resolved Jul 16, 2025, 05:20 PM UTC

    This incident has been resolved.

  7. postmortem Jul 18, 2025, 09:11 PM UTC

    **Incident Overview** On July 16th, 2025, the DrChrono application experienced a temporary service outage following a scheduled release the evening prior. The release itself was successful; however, during post-release monitoring the morning of July 17th we observed slightly elevated memory pressure across application servers. This memory pressure was not causing end user experience impact but was identified due to increased observability following the deployment. In response, a proactive configuration change was made to improve memory usage. Unfortunately, this adjustment unintentionally restricted the system’s ability to allocate sufficient resources for application processes, resulting in a temporary outage. Due to this occurring during core business hours, it took some time to restore enough resources to support application traffic, but services returned to normal operation once resources were restored. **How We Responded** The configuration change was reverted, and traffic was temporarily paused to allow the system to recover. Once the application was confirmed healthy traffic was resumed and the system became fully available. **Corrective and Preventative Actions** To prevent recurrence, we are taking the following steps: * **Standard Operating Procedure \(SOP\) Enhancements:** We are updating and reinforcing our internal SOPs to emphasize slow rollout and verification testing when applying infrastructure setting changes prior to rollout for all traffic – even when thought to be safe or simple. * **Warm Resources on Standby:** We have created and will continue to maintain a pool of separate warm servers so that we can restore previous configurations more quickly as well as spin up needed resources faster in cases of high traffic. We know that many of you rely on DrChrono every day to support your operations. We sincerely apologize for this disruption and are committed to strengthening our systems to prevent it from happening again. Thank you for your patience and continued trust.