Baremetrics incident

Customer Search

Minor Resolved View vendor source →

Baremetrics experienced a minor incident on October 31, 2017, lasting 3d 6h. The incident has been resolved; the full update timeline is below.

Started
Oct 31, 2017, 11:50 AM UTC
Resolved
Nov 03, 2017, 05:58 PM UTC
Duration
3d 6h
Detected by Pingoru
Oct 31, 2017, 11:50 AM UTC

Update timeline

  1. investigating Oct 31, 2017, 11:50 AM UTC

    The customer search section of the dashboard and API are currently unavailable

  2. identified Oct 31, 2017, 11:50 AM UTC

    We have identified the issue, but it may take a while for this to recover. We will keep you updated on the progress.

  3. identified Nov 01, 2017, 02:31 PM UTC

    Just an update on what is happening: We use Elasticsearch for our customer search and profile pages. This allows us to scale search out to millions of records, and still deliver very fast search results. Yesterday elasticsearch crashed, due to hitting the JVM 32gb heap limit. However, we did not notice this error as it was buried in the logs around other errors. We attempted to restore the cluster, which hit the limit again a few hours later. Right now, we are scaling up more machines, and rebalancing the elasticsearch cluster to resolve this issue. Sadly the restore time is slow, but we are making progress.

  4. identified Nov 02, 2017, 07:33 PM UTC

    Customer search is back for most accounts, with a few remaining.

  5. monitoring Nov 02, 2017, 10:09 PM UTC

    All accounts should now be back online. We are continuing to monitor this.

  6. resolved Nov 03, 2017, 05:58 PM UTC

    This incident has been resolved.