Hummingbird incident

Hummingbird App Currently Experiencing Case Loading Issues

Hummingbird experienced a major incident on November 13, 2025, lasting 1d 4h. The incident has been resolved; the full update timeline is below.

Started: Nov 13, 2025, 09:41 PM UTC
Resolved: Nov 15, 2025, 01:59 AM UTC
Duration: 1d 4h
Detected by Pingoru: Nov 13, 2025, 09:41 PM UTC

Update timeline

resolved Nov 15, 2025, 02:11 PM UTC

# Hummingbird Incident Report: Search and Dashboard Service Disruption # Summary From the afternoon of November 13 through November 14, Hummingbird experienced a failure in the OpenSearch cluster that powers search and dashboard experiences. The failure disrupted multiple workflows, including the cases dashboard, profiles dashboard, search functionality, and the ability to add CRM profiles to cases. Engineering stabilized the underlying infrastructure and initiated a multi-phase reindexing effort across all major data domains. Core features are now restored for all customers. --- ## Glossary - **OpenSearch Domain** – The managed search cluster powering Hummingbird’s search and dashboard functionality. - **Index** – A collection of searchable records (e.g. cases, reviews, CRM data, transactions), separated by customer. Missing or misconfigured indices can cause data to appear incomplete in dashboards or search. - **Reindexing** – The process of populating indices from source-of-truth data, which resides in the database. Newest data is typically available first during the reindexing process. --- ## Customer Impact - **Search, dashboard views, and case-related operations** were intermittently unavailable or showed incomplete data across Cases, Reviews, CRM Profiles, and Transactions. - **Intermittent server (502) errors** occurred during the initial recovery window due to elevated backend load. - **Dashboards and Add-to-Case functionality** remained partially degraded until their search indices were repopulated. - **Recovery was slow but targeted** as we were able to repopulate indexes starting with the most recently created data. This meant customers could resume work on open cases while recovery continued. - **Data integrity was maintained throughout the incident**. At no time was any data lost. The impacted search infrastructure is a replicated index of data stored in our primary Postgres data store. - **SAR Filing functionality** **was not directly impacted** and filings continued throughout the incident. --- ## Timeline (ET) - **Nov 13 4:45 PM** — OpenSearch cluster failure detected; engineers begin incident response for search/dashboard service disruption. - **Nov 13 5:00 PM** — OpenSearch cluster recreated and index reprovisioning begins; search queue paused to avoid persistent indexing failures while reprovisioning is underway. - **Nov 13 6:30 PM** — Reindex plan formalized with priority placed on restoring recent data to minimize disruption to timely work; reviews reindexing initiated for all customers. - **Nov 13 7:30 PM** — CRM reindexing initiated for all customers. - **Nov 13 9:50 PM** — Performance improvement to speed up CRM reindexing is deployed. - **Nov 14 1:54 AM** — Healthy progress on reviews and CRM reindexing is confirmed; Transaction reindexing is initiated. - **Nov 14 5:43 AM** — Team observes scheduler for reindexing is suboptimal for transactions specifically and manually schedules jobs to maximize throughput. - **Nov 14 9:13 AM** — Team implements improvements to further improve throughput of transaction reindexing. - **Nov 14 9:20 AM** — Secondary incident identified: index settings supporting the profiles index have not been restored correctly, leading to errors on CRM profiles dashboard and Add-to-Case functionality despite healthy underlying data; investigation begun. - **Nov 14 10:19 AM** — Team begins implementing change to expedite reindexing of time-sensitive filing data. - **November 14, 2025 2:10 PM** — A change to search provisioning is deployed that lets us finalize indices for CRM profiles, fixing the Profiles dashboard and Add-to-Case functionality. - **November 14, 2025 2:15 PM** — Transactions are ~50% reindexed. - **November 14, 2025 2:34 PM** — Team completes expedited reindexing of time-sensitive filing data. - **November 14, 2025 10:21 PM** — Team identifies 34 organizations whose transaction data was consistently timing out during reindexing and begins investigating. - **November 14, 2025 11:54 PM** — Team traces the slowdown to an inefficient query used during the transaction reindexing process. - **November 15, 2025 2:10 AM** — **November 15, 2025 3:51 AM** — Team deploys several improvements to optimize performance of the query. - **November 15, 2025 4:19 AM** — Transactions are 100% reindexed. --- ## Root Cause An unintended infrastructure change associated with a previously executed upgrade to our search infrastructure caused Terraform to delete and recreate the primary OpenSearch domain. This destructive operation removed all existing indices and their associated data, leading to widespread search and dashboard failures across the platform. During recovery, a secondary issue created additional downstream failures. When initially rebuilding the search infrastructure, some customer indices were created with incorrect settings. The misalignment in settings caused our search provisioning functions to halt before completely configuring all customer search indices. Because we were not able to complete configuration, there were errors when searching across CRM profiles, preventing access to the Profiles dashboard and Add to Case features. As a result, while we were able to repopulate the majority of our search infrastructure during the evening of November 13th, the incident continued to have significant impact due to the inability to search across CRM profiles until the afternoon of November 14th.