Bettermode incident
We have identified an issue around our community service
Bettermode experienced a critical incident on November 15, 2020, lasting 31m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Nov 15, 2020, 09:06 PM UTC
The community service is experiencing technical difficulties. Our tech team is notified.
- resolved Nov 15, 2020, 09:27 PM UTC
We've resolved the issue regarding community service. Everything is back up.
- postmortem Nov 17, 2020, 04:54 PM UTC
On November 17th from 15:55 ET to 16:27 ET we experienced 15 minutes of downtime and 17 minutes of extreme slowness \(7-10 seconds P95 response time\). The issue was a hardware issue in one of our load balancers provided by our infrastructure provider. All our load balancers have a fail-safe mechanism that fallback faulty load balancers to the backup one. After the incident was fixed, over the past day we’ve worked closely with our infrastructure provider engineers to figure out the reason the load balancer did not fallback to the backup LB. We can confirm that our infrastructure provider found out the issue and have patched their infrastructure and added tests to prevent similar issue from happening.