Memsource incident

Degraded Performance of Phrase Orchestrator (EU & US) Workflow Engine Component between October 10, 10:10 AM CEST and October 10, 01:00 PM CEST

Major Resolved View vendor source →

Memsource experienced a major incident on October 10, 2025 affecting Legacy Workflow Engine and Legacy Workflow Engine, lasting 2h 41m. The incident has been resolved; the full update timeline is below.

Started
Oct 10, 2025, 10:52 AM UTC
Resolved
Oct 10, 2025, 01:34 PM UTC
Duration
2h 41m
Detected by Pingoru
Oct 10, 2025, 10:52 AM UTC

Affected components

Legacy Workflow EngineLegacy Workflow Engine

Update timeline

  1. investigating Oct 10, 2025, 10:52 AM UTC

    Phrase Orchestrator (EU & US) has been experiencing degraded performance. The engineering team is investigating the issue. We apologize for any inconvenience caused.

  2. monitoring Oct 10, 2025, 11:21 AM UTC

    Recovery is in progress across both EU and US regions, and the queues are currently being processed. Customers might still experience delayed workflow executions while the backlog is cleared. We are closely monitoring the situation to ensure full stability.

  3. resolved Oct 10, 2025, 01:34 PM UTC

    This incident has been resolved.

  4. postmortem Oct 14, 2025, 11:50 AM UTC

    # **Root Cause Analysis** October 10, 2025 ### **Introduction** We would like to share more details about the events that occurred with Phrase Orchestrator \(EU & US\) between 10:05 AM CEST and 01:00 PM CEST on October 10, 2025 which led to delayed workflow executions for non-scheduled workflows and what Phrase engineers are doing to prevent these issues from reoccurring. ### **Timeline** 10:05 AM CEST: Our logging system showed repeated errors after introducing a change to Orchestrator’s system dependencies. 11:54 AM CEST: Through our monitoring system, Orchestrator engineers observed that workflows were not being processed by the Orchestrator workflow engine component. They began investigating and quickly identified the root cause - a recent system dependency change. 12:09 PM CEST: Our engineers prepared a rollback of the problematic change. 01:00 PM CEST: The rollback was successful. The workflow engine component began processing queued workflows again. 03:15 PM CEST: Orchestrator engineers confirmed the queue had cleared. ### **Root Cause** There was an update to a dependency which checks if the Workflow Engine component is under undue load. Due to a change in behavior of this dependency, the Workflow Builder component was unable to receive accurate load data from the Workflow Engine component. A, as a consequence, Orchestrator’s fail-safe mechanism activated to prevent data loss, causing no further workflows to execute in the workflow engine. ### **Actions to Prevent Recurrence** To prevent similar incidents in the future will improve our alerting around error logs. Additionally, we have identified further improvements for our alerting system that will notify us earlier in case of workflows not being processed. We will also speed up the process of rolling back releases in order to revert changes faster.