Kalix EMR incident

Delays for certain actions such as sending messages

Major Resolved View vendor source →

Kalix EMR experienced a major incident on November 8, 2021 affecting Kalix Platform and Messaging, lasting 7d 4h. The incident has been resolved; the full update timeline is below.

Started
Nov 08, 2021, 05:18 PM UTC
Resolved
Nov 15, 2021, 10:11 PM UTC
Duration
7d 4h
Detected by Pingoru
Nov 08, 2021, 05:18 PM UTC

Affected components

Kalix PlatformMessaging

Update timeline

  1. identified Nov 08, 2021, 05:18 PM UTC

    Messages are currently delayed in Kalix along with a set of other actions. Due to a bug with updates on statuses on messages we had to push an update to Kalix which caused a re-index of current data. This caused a large amount of work to be added which is currently being processed and may cause a delay on other actions. Right now messages may be 'Pending' longer than expected, and may say 'Sending' when actually they have been sent. Other actions such as creating batches in billing may also be delayed. We are in the process of adding extra servers to move through the backlog quicker.

  2. monitoring Nov 08, 2021, 06:14 PM UTC

    We have tripled the capacity for our queues so most actions that rely on the queue should not have a delay anymore. There is a backlog for messages (ie email, SMS, etc) which may take a few minutes to get through. Messages will transition to the 'Sending' status pretty quickly and this will indicate that the message has been sent. There will still be a large delay until the messages change to the 'Success' status however. We are monitoring this and will update when all queues have caught up.

  3. monitoring Nov 09, 2021, 09:26 AM UTC

    Unfortunately the indexing of all the messages is still ongoing, and will probably continue for a good part of the day on Tuesday. The main impact will be messages will show as 'Sending' instead of showing their expected 'Success' value, however it can be assumed that any message that is showing as 'Sending' has in fact been sent out, and is just waiting for the status to sync. We will continue to monitor today and send out an update when things should be starting to look like normal again.

  4. monitoring Nov 10, 2021, 10:14 AM UTC

    In order to restore normal functionality in Kalix, we have flipped the order in which queued items will be processed. Existing messages have been pushed into a secondary queue which means new messages will be processed first. What this means is that new messages (emails, SMS, fax, etc) should correctly process from 'Sending' -> 'Success' as expected. Older messages from this week will still be showing as 'Sending' but will eventually settle on 'Success' as the week goes along. We are monitoring the progress of them and will resolve this issue once all messages are synced. At the current rate of progress this may take a few days.

  5. resolved Nov 15, 2021, 10:11 PM UTC

    Indexing has finally completed so older messages should also be up to date. There are no ongoing issues that we are aware of at this time with messaging, so please let us know at [email protected] if you have any.