Voyado incident

[Engage] Delayed sendout status updates

Voyado experienced a minor incident on October 8, 2025 affecting Messaging, lasting 3h 21m. The incident has been resolved; the full update timeline is below.

Started: Oct 08, 2025, 10:02 AM UTC
Resolved: Oct 08, 2025, 01:24 PM UTC
Duration: 3h 21m
Detected by Pingoru: Oct 08, 2025, 10:02 AM UTC

Affected components

Messaging

Update timeline

identified Oct 08, 2025, 10:02 AM UTC

We are currently experiencing a delay in how sendout statuses are updated in Engage. Some sendouts may appear to be stuck in Scheduled, even though they have actually been sent as expected. This issue does not affect message delivery, only how the status is shown in the interface. We are working to resolve the issue and will post an update here once it has been fully resolved.
resolved Oct 08, 2025, 01:24 PM UTC

This incident has been resolved.
postmortem Nov 06, 2025, 09:57 PM UTC

### Summary Between October 7th and October 8th, users could experience delays in the display of message send-out status updates in the Engage interface. While the messages were sent as expected, the visual feedback in the interface did not reflect their actual status. ### Customer Impact During the incident, message send-out status and related statistics were delayed in Engage. No messages were lost or failed to send—this was a display issue only. ### Root Cause The delay was caused by an unusually large batch of events from our data processing pipeline that overwhelmed an internal message handler. The large batch occurred due to a change in configuration during refactoring that led to unintentionally re-sending of all historic events. While the initial ingestion layer could handle the load, the downstream system could not process events fast enough, resulting in a backlog. A key process responsible for moving events downstream stopped functioning correctly, and the issue went unnoticed for several hours adding to the backlog and ultimate delay. ### Mitigation * Redundant subscriptions were removed and recreated to clear the backlog. * Processing power was increased for the affected message handler. * The rate of downstream event processing was reduced to alleviate pressure. * Event processing was monitored until the backlog was fully cleared. By the afternoon of October 8th, the system had fully caught up and normal status updates were visible again in Engage. ### Next Steps To reduce the risk of similar issues occurring in the future, we are implementing the following improvements: * Improving autoscaling for the internal message handler system * Improving safeguards in our data platform to avoid overwhelming downstream systems * Adding delivery time monitoring to our internal dashboards to detect delays earlier We apologize for the inconvenience this may have caused and appreciate your patience while we worked to resolve the issue.