Avochato incident

Platform Latency

Major Resolved View vendor source →

Avochato experienced a major incident on November 24, 2020 affecting avochato.com and API and 1 more component, lasting 3h 35m. The incident has been resolved; the full update timeline is below.

Started
Nov 24, 2020, 10:41 PM UTC
Resolved
Nov 25, 2020, 02:16 AM UTC
Duration
3h 35m
Detected by Pingoru
Nov 24, 2020, 10:41 PM UTC

Affected components

avochato.comAPIMobile

Update timeline

  1. investigating Nov 24, 2020, 10:41 PM UTC

    We are currently investigating this issue.

  2. identified Nov 24, 2020, 11:14 PM UTC

    We are working to deploy an update to resolve issues impacting clients.

  3. monitoring Nov 25, 2020, 12:01 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Nov 25, 2020, 02:16 AM UTC

    This incident has been resolved.

  5. postmortem Nov 25, 2020, 08:45 PM UTC

    ## What happened High concurrent outbound message volume caused our production write database to run out of connections. This caused most queued processes to take an extremely long time to finish, as well as page load times to time out for many users who tried accessing the platform during the impact period. ## Impact Pending messages, inbound messages, and broadcasts during this period may have remained queued but were not dropped. Inbound calls initiated to Avochato numbers during this period were often unable to connect or be forwarded properly. Upon resolution, inbound messages and queued work retried themselves and in most identifiable cases were received properly. ## Resolution Our database automatically failed over to a read replica and was able to resume serving requests, however we are investigating ways for this failover to happen sooner to prevent longer periods of inaccessibility. Our engineers have identified the root cause relating to message callback method prioritization, and we patched our production application servers with both a fix for the root cause as well as new safeguards to prevent excess resource consumption during periods of extreme load. We are evaluating solutions to make our infrastructure more resilient while continuing to offer a best in class live inbox experience for customers of all sizes. As a team, we have committed to aggressively monitoring our platform’s health and proactively deploying updates to bottlenecks detected in our current application. We appreciate the trust you place in our platform for communicating to those that matter most to you, and thank you for your patience during this busy time of the year. Thank you for choosing Avochato, Christopher Neale, CTO and Co-founder