Bettermode incident

We have identified an issue around our community service

Critical Resolved View vendor source →

Bettermode experienced a critical incident on November 15, 2020, lasting 31m. The incident has been resolved; the full update timeline is below.

Started
Nov 15, 2020, 08:55 PM UTC
Resolved
Nov 15, 2020, 09:27 PM UTC
Duration
31m
Detected by Pingoru
Nov 15, 2020, 08:55 PM UTC

Update timeline

  1. investigating Nov 15, 2020, 09:06 PM UTC

    The community service is experiencing technical difficulties. Our tech team is notified.

  2. resolved Nov 15, 2020, 09:27 PM UTC

    We've resolved the issue regarding community service. Everything is back up.

  3. postmortem Nov 17, 2020, 04:54 PM UTC

    On November 17th from 15:55 ET to 16:27 ET we experienced 15 minutes of downtime and 17 minutes of extreme slowness \(7-10 seconds P95 response time\). The issue was a hardware issue in one of our load balancers provided by our infrastructure provider. All our load balancers have a fail-safe mechanism that fallback faulty load balancers to the backup one. After the incident was fixed, over the past day we’ve worked closely with our infrastructure provider engineers to figure out the reason the load balancer did not fallback to the backup LB. We can confirm that our infrastructure provider found out the issue and have patched their infrastructure and added tests to prevent similar issue from happening.