TrekkSoft incident

TrekkSoft message queue emptied

Notice Resolved View vendor source →

TrekkSoft experienced a notice incident on April 8, 2019 affecting TrekkSoft Backoffice and TrekkSoft API and 1 more component, lasting 2d 16h. The incident has been resolved; the full update timeline is below.

Started
Apr 08, 2019, 08:24 PM UTC
Resolved
Apr 11, 2019, 12:43 PM UTC
Duration
2d 16h
Detected by Pingoru
Apr 08, 2019, 08:24 PM UTC

Affected components

TrekkSoft BackofficeTrekkSoft APITrekkSoft Mobile App (mPOS)POS DeskTrekkSoft Website BuilderExperienceBank - Channel Manager

Update timeline

  1. identified Apr 08, 2019, 08:24 PM UTC

    Dear users, Today at 8:04pm our message queues (software that manages "messages" in order to process them asynchronously) got emptied. We already identified what caused this and are working on a fix. We will provide a more detailed overview and status update as soon as possible and apologies for the inconvenience caused.

  2. identified Apr 09, 2019, 11:06 AM UTC

    Dear users, As we wrote yesterday our message queues got emptied yesterday at 8PM CET. This affected the followings: * availabilities across all channels such as POS desk, OTAs an the frontend (TrekkSoft websites and widgets) * basket and guest caching * communication messages such as Zapier tasks and emails triggered We`ve recahed all baskets and guest for this time period. Also, the booking creation messages were regenerated. The availabilities in the meantime got updated as they regularly do, so no manual involvement was necessary from our side. As the next step we will recache all addiotional messages for the queue and implement storing for these messages so we can avoid losing them in the future and they won`t have to be re-generated. Moreover, we are in touch with our service provider to avoid running out of memory again and losing all our message queues. As soon as all these remaining steps are complete, we will share another update. Once, we apologise for the inconvenience caused by this and thank you for your understanding.

  3. identified Apr 09, 2019, 01:53 PM UTC

    We are continuing to work on a fix for this issue.

  4. monitoring Apr 09, 2019, 02:04 PM UTC

    Dear users, We have recached and regenerated all missing messages, bringing them back to our queue once again. These are currently going through to the accounts, correcting the last bits of the negative consequences of losing our message queue. The remaining actions planned on our side are more preventative measures rather than correctional steps, which will take longer to implement and won`t have immediate consequences.

  5. monitoring Apr 10, 2019, 07:18 AM UTC

    Dear users, Yesterday at 8:44PM CET we have lost the messages from our queues again, but this time it will have a much smaller scale of impact due to the low amount of messages. We are already working on recaching these. In the meantime we have implemented the temporary storage of messages, so they will not be lost again in case of a reboot. Also, we are in touch with our service provider to increase the priority of them checking the memory capacity issues we`ve experienced recently. We sincerely apologise for the inconvenience caused.

  6. resolved Apr 11, 2019, 12:43 PM UTC

    Dear users, We have been closely monitoring this issue after implementing some changes and everything seems to be back to normal. These changes were: * temporary storage of messages in case of losing queues again * refactoring how messages are added to the queue to limit unnecessary messages During the maintenance next week at 6:00-6:30AM CET we will increase our memory capacity. This means that we have less messages to process, we have safety measurements in place and we will have more capacity to process these messages. We will keep on working on this to further improve performance over the course of the next few months. Thank you for your patience so far and we apologise for any inconvenience caused.