Kalix EMR incident
Delays for certain actions such as sending messages
Kalix EMR experienced a major incident on November 8, 2021 affecting Kalix Platform and Messaging, lasting 7d 4h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Nov 08, 2021, 05:18 PM UTC
Messages are currently delayed in Kalix along with a set of other actions. Due to a bug with updates on statuses on messages we had to push an update to Kalix which caused a re-index of current data. This caused a large amount of work to be added which is currently being processed and may cause a delay on other actions. Right now messages may be 'Pending' longer than expected, and may say 'Sending' when actually they have been sent. Other actions such as creating batches in billing may also be delayed. We are in the process of adding extra servers to move through the backlog quicker.
- monitoring Nov 08, 2021, 06:14 PM UTC
We have tripled the capacity for our queues so most actions that rely on the queue should not have a delay anymore. There is a backlog for messages (ie email, SMS, etc) which may take a few minutes to get through. Messages will transition to the 'Sending' status pretty quickly and this will indicate that the message has been sent. There will still be a large delay until the messages change to the 'Success' status however. We are monitoring this and will update when all queues have caught up.
- monitoring Nov 09, 2021, 09:26 AM UTC
Unfortunately the indexing of all the messages is still ongoing, and will probably continue for a good part of the day on Tuesday. The main impact will be messages will show as 'Sending' instead of showing their expected 'Success' value, however it can be assumed that any message that is showing as 'Sending' has in fact been sent out, and is just waiting for the status to sync. We will continue to monitor today and send out an update when things should be starting to look like normal again.
- monitoring Nov 10, 2021, 10:14 AM UTC
In order to restore normal functionality in Kalix, we have flipped the order in which queued items will be processed. Existing messages have been pushed into a secondary queue which means new messages will be processed first. What this means is that new messages (emails, SMS, fax, etc) should correctly process from 'Sending' -> 'Success' as expected. Older messages from this week will still be showing as 'Sending' but will eventually settle on 'Success' as the week goes along. We are monitoring the progress of them and will resolve this issue once all messages are synced. At the current rate of progress this may take a few days.
- resolved Nov 15, 2021, 10:11 PM UTC
Indexing has finally completed so older messages should also be up to date. There are no ongoing issues that we are aware of at this time with messaging, so please let us know at [email protected] if you have any.