Voyado experienced a major incident on May 27, 2025 affecting Email Recommendations, lasting 2h 5m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating May 27, 2025, 08:54 PM UTC
We are current seeing degraded service in the Email Recommendation service.
- resolved May 27, 2025, 09:30 PM UTC
Service is back to normal.
- postmortem May 28, 2025, 08:47 AM UTC
## **Description and Impact** A recent update to the Email Recommendations service introduced a change intended to simplify configuration and improve caching. However, this inadvertently caused image files to be stored locally on individual servers rather than in shared storage. As a result, image requests frequently failed, triggering a surge in background jobs attempting to recreate missing images. These jobs launched in an uncontrolled manner, consuming excessive CPU resources. Even with full auto-scaling in effect, all available server capacity was quickly saturated, which led to degraded performance and service outages. Most requests during this period failed with error responses, and any successful responses were noticeably delayed. We understand the inconvenience this caused and acted swiftly to resolve the situation. ## **Affected Area** Email Recommendations ## **Timeline** * **2025-05-27 12:00 UTC** – A new version of Email Recommendations, that included the bug, was deployed * **2025-05-27 18:50 UTC** – Service degradation began * **2025-05-27 19:00 UTC** – Issue detected and investigation started * **2025-05-27 21:00 UTC** – Service fully restored ## **Actions Going Forward** * Configuration has been corrected to ensure proper handling of image storage * New alerts have been added to detect high CPU usage in fully scaled environments at an earlier stage * Additional automated testing will be introduced to better catch similar issues before deployment