Avochato incident

Issues creating new tags

Minor Resolved View vendor source →

Avochato experienced a minor incident on August 8, 2022 affecting avochato.com and API and 1 more component, lasting 1d 5h. The incident has been resolved; the full update timeline is below.

Started
Aug 08, 2022, 11:14 PM UTC
Resolved
Aug 10, 2022, 04:36 AM UTC
Duration
1d 5h
Detected by Pingoru
Aug 08, 2022, 11:14 PM UTC

Affected components

avochato.comAPIMobile

Update timeline

  1. identified Aug 08, 2022, 11:14 PM UTC

    We're currently experiencing issues tagging objects in our database and are working on a solution. Broadcasts and campaigns are disabled and will not function in the meantime until we find a resolution.

  2. identified Aug 08, 2022, 11:34 PM UTC

    In order to expedite resolving issues with Broadcasts and Tag creation, we will be performing emergency database maintenance. Inboxes may not be accessible at this time while we complete maintenance. We appreciate your patience.

  3. identified Aug 09, 2022, 03:02 PM UTC

    Resolution is still in progress, but some app functionality including broadcasting and tagging are still impacted. We appreciate your patience as we work to resolve this issue.

  4. identified Aug 09, 2022, 05:01 PM UTC

    We are continuing to work on resolving issues tagging contacts and broadcasting. We appreciate your patience.

  5. identified Aug 09, 2022, 06:49 PM UTC

    In order to complete the required maintenance, inboxes will temporarily stop serving conversations. We are trying to minimize this window as much as possible and apologize for the inconvenience.

  6. identified Aug 09, 2022, 06:55 PM UTC

    Inbox functionality has been restored while we proceed to the next phase of maintenance. Thank you for your patience. We are continuing to work on resolving issues with Broadcasts and Tags.

  7. identified Aug 09, 2022, 07:16 PM UTC

    Our maintenance has successfully resolved the primary issue causing tags and broadcasts. Our system is continuing to work through the backlog of tasks since the start of the incident. Attempts at uploading contacts to broadcasts and manually tagging contacts since the start of the incident will slowly re-attempt. We apologize for the inconvenience and will continue updating as we perform a rolling restart of the system.

  8. monitoring Aug 09, 2022, 07:47 PM UTC

    Broadcasting, tagging, and related Avochato functionality has been restored. We are continuing to monitor our systems post-maintenance. Over the next 24 hour period, our system will work through the backlog of business logic relating to contact uploads, tags, and broadcasts.

  9. resolved Aug 10, 2022, 04:36 AM UTC

    This incident has been resolved.

  10. postmortem Aug 10, 2022, 04:43 PM UTC

    ## What Happened Due to continued growth of activity on the platform, a fast-growing and critical database table exceeded the 32 bit memory requirement for inserting new primary keys, preventing additional tags from being created. The inability to create new rows in this table impacted a number of features including adding contacts to broadcasts, adding tags to contacts, and more. Until the column type could be upgraded from Integer to BigInteger, those features were impacted and other business logic within the Avochato application was subsequently impacted. Unfortunately, the upgrade process took significant time due to the size of the table itself, leading to a protracted incident window for the impacted feature set. In order to resolve the issue, the engineering team placed the Avochato platform in maintenance mode during 4:15-5pm in order to attempt to resolve the issue, which directly impacted the ability to read conversations in the inbox during the maintenance window. A second process was applied overnight and into the following business day to fully migrate the database and resolve the issue without losing data until the new primary key column using the BigInteger type was populated, reindexed, and could be swapped for the old column. ## Resolution The team performed an in-place migration to change the column type from Integer to BigInteger. This took quite a bit of time to perform on a table with over 2 billion rows, but once the migration was complete, platform functionality immediately returned to normal. Meanwhile, the design of the Avochato platform allowed failure attempts for things like tagging, broadcasting, and contact uploading to be queued for retry while the platform was impacted. Those tasks were then completed successfully in the order they were received as soon as the incident was resolved. Moving forward, we are auditing our legacy database tables to ensure that primary or foreign keys across our data warehouse will scale decades into the future as we continue to experience growth on the platform. We know how critical Avochato is for communicating with your teams, and appreciate your patience during this period.