Voyado incident

[Engage] Delay in Processing of Email Sendouts

Voyado experienced a minor incident on August 1, 2025 affecting Messaging, lasting 2h 11m. The incident has been resolved; the full update timeline is below.

Started: Aug 01, 2025, 01:26 PM UTC
Resolved: Aug 01, 2025, 03:37 PM UTC
Duration: 2h 11m
Detected by Pingoru: Aug 01, 2025, 01:26 PM UTC

Affected components

Messaging

Update timeline

investigating Aug 01, 2025, 01:26 PM UTC

We are investigating indications of a potential issue affecting the processing of email sendouts.
investigating Aug 01, 2025, 02:51 PM UTC

We have identified that the current delays are only affecting emails sent through automations. Our team is actively working on a solution to resolve this as quickly as possible. We will continue to share updates here as soon as new information becomes available.
monitoring Aug 01, 2025, 03:19 PM UTC

We have implemented a fix that is showing very promising results. The queue of delayed messages is rapidly decreasing. We will continue to monitor the progress until the processing times are back to normal.
resolved Aug 01, 2025, 03:37 PM UTC

The delay affecting email sendouts in automations has been resolved, and processing speeds have returned to normal. We will continue to monitor the email queues to ensure ongoing stability.
postmortem Aug 26, 2025, 06:22 AM UTC

## Summary On August 1st, approximately between 14:25–17:30 CEST, Voyado Engage experienced issues where email send-outs triggered by Automation workflows were delayed. The issue was caused by a combination of several factors leading to failure in processing for the internal messages handler, halting workflow execution leading to a delay in email send-outs. ## Customer Impact Tenants with Automation workflows triggering email send-outs during the incident window were affected. While no messages were lost, all send-outs were delayed until the issue was resolved and normal operations resumed. ## Root Cause Our investigations leads us to the conclusion that the issue was caused by a combination of several factors where services in charge of processing internal messaging becomes overloaded and unable to process their commands. This caused a full stop in the execution of Automations. A record-high number of messages were in queue, putting an immediate stress to the system and exhausting system capacity. These services had not been fully restarted for a period longer than usual, which may have contributed to degraded performance. ## Mitigation The issue was resolved by incrementally restarting the affected services. As they were brought back online, message processing resumed and queues cleared automatically. No manual reprocessing was required and all delayed send-outs were successfully delivered. ## Next Steps We have reviewed and improved monitoring of queues for internal messaging, added monitoring to understand more on infrastructure health during similar events in the future and are continuously working on improving performance to ensure stability and quality. We apologize for the inconvenience this may have caused and appreciate your patience as we worked to restore normal operations.