Phrase incident

Performance Disruption of All Phrase TMS (EU) Components between March 27, 2025 9:01 AM CET and March 27, 2025 9:17 AM CET

Critical Resolved View vendor source →

Phrase experienced a critical incident on March 27, 2025 affecting Analytics and API and 1 more component, lasting 18m. The incident has been resolved; the full update timeline is below.

Started
Mar 27, 2025, 08:18 AM UTC
Resolved
Mar 27, 2025, 08:36 AM UTC
Duration
18m
Detected by Pingoru
Mar 27, 2025, 08:18 AM UTC

Affected components

AnalyticsAPICAT web editorConnectorsFile processingMachine translationProject managementTerm baseTranslation memory

Update timeline

  1. investigating Mar 27, 2025, 08:18 AM UTC

    We are currently experiencing a performance disruption of all Phrase TMS (EU) components. Our engineering team is investigating the issue.

  2. identified Mar 27, 2025, 08:26 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Mar 27, 2025, 08:27 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Mar 27, 2025, 08:36 AM UTC

    This incident has been resolved.

  5. postmortem Apr 07, 2025, 12:47 PM UTC

    ### **Introduction** We would like to share more details about the events that occurred with Phrase between 9:01 AM CET and 9:17 AM CET on March 27, 2025 which led to an outage of all Phrase TMS \(EU\) components and what Phrase engineers are doing to prevent these issues from reoccurring. ### **Timeline** 09:01 AM CET: The number of processed requests began to slowly decline. 09:07 AM CET: Monitoring systems alerted our on-call teams about the instability of the environment. 09:12 AM CET: Support tickets started to appear. Our engineers begin investigating the situation. 09:17 AM CET: A faulty component was identified and restarted, which stabilized the environment. ‌ **Root Cause** One of the caching components nodes became overloaded with incoming traffic. This caused dependent services to experience significant delays while waiting for responses from the cache middleware. ‌ **Actions to Prevent Recurrence** We are upgrading the caching system to a newer version and increasing the number of cache nodes to better spread the traffic and prevent network congestion.