Memsource incident
Performance Disruption of All Phrase TMS (EU) Components between March 27, 2025 9:01 AM CET and March 27, 2025 9:17 AM CET
Memsource experienced a critical incident on March 27, 2025 affecting Analytics and API and 1 more component, lasting 18m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Mar 27, 2025, 08:18 AM UTC
We are currently experiencing a performance disruption of all Phrase TMS (EU) components. Our engineering team is investigating the issue.
- identified Mar 27, 2025, 08:26 AM UTC
The issue has been identified and a fix is being implemented.
- monitoring Mar 27, 2025, 08:27 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Mar 27, 2025, 08:36 AM UTC
This incident has been resolved.
- postmortem Apr 07, 2025, 12:47 PM UTC
### **Introduction** We would like to share more details about the events that occurred with Phrase between 9:01 AM CET and 9:17 AM CET on March 27, 2025 which led to an outage of all Phrase TMS \(EU\) components and what Phrase engineers are doing to prevent these issues from reoccurring. ### **Timeline** 09:01 AM CET: The number of processed requests began to slowly decline. 09:07 AM CET: Monitoring systems alerted our on-call teams about the instability of the environment. 09:12 AM CET: Support tickets started to appear. Our engineers begin investigating the situation. 09:17 AM CET: A faulty component was identified and restarted, which stabilized the environment. **Root Cause** One of the caching components nodes became overloaded with incoming traffic. This caused dependent services to experience significant delays while waiting for responses from the cache middleware. **Actions to Prevent Recurrence** We are upgrading the caching system to a newer version and increasing the number of cache nodes to better spread the traffic and prevent network congestion.