CommCare HQ incident

Outage on on www.commcarehq.com (some users unable to log in, submit forms or sync to the server)

Critical Resolved View vendor source →

CommCare HQ experienced a critical incident on May 21, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
May 21, 2024, 10:30 AM UTC
Resolved
May 21, 2024, 10:30 AM UTC
Duration
Detected by Pingoru
May 21, 2024, 10:30 AM UTC

Update timeline

  1. resolved May 21, 2024, 11:37 AM UTC

    Dear users, We experienced an outage on www.commcarehq.org from 5:20 am UTC to 11:10 am UTC. Users received an error when trying to submit forms or sync with the server on mobile and Web Apps. Our developers were able to resolve the issue and we are closely watching for any reoccurrence. We regret if this may have caused any kind of inconvenience. Thank you for your collaboration and patience during this time.

  2. postmortem Jul 04, 2024, 12:19 PM UTC

    ### Incident Summary On May 21st, 2024, mobile workers experienced errors on the CommCare mobile application while trying to sync. Web users on CommCare HQ also encountered errors while logging into the web-based applications. ### What Happened The handling of requests from mobile devices and web apps, such as restores, case searches, and form submissions, was blocked at times during the incident. Restore requests, especially large ones, can be resource-intensive, increasing CPU and memory usage. In this particular case, the issue was caused by mobile reports included in some restores. One such restore caused one of our machines to run out of memory, preventing many users on CommCare from syncing to the server and logging into their applications. Fortunately, no data was lost, as the restores completed later the same day after we deployed a fix. ### Looking Forward Our engineering team has conducted a retrospective to understand the root cause of the issue. The team is primarily investigating a long-term fix to improve the restore logic and prevent this incident from reoccurring in the future.