Phrase incident

Performance Disruption of Phrase Portal (EU) between January 15, 2026, 04:15 PM UTC and 06:03 PM UTC

Phrase experienced a critical incident on January 15, 2026 affecting Phrase Portal (EU), lasting 1h 12m. The incident has been resolved; the full update timeline is below.

Started: Jan 15, 2026, 05:38 PM UTC
Resolved: Jan 15, 2026, 06:50 PM UTC
Duration: 1h 12m
Detected by Pingoru: Jan 15, 2026, 05:38 PM UTC

Affected components

Phrase Portal (EU)

Update timeline

investigating Jan 15, 2026, 05:38 PM UTC

Phrase Portal is currently not accessible. Our engineering team is investigating the issue.
identified Jan 15, 2026, 06:04 PM UTC

Engineers identified the cause of the issue and will deploy a fix.
identified Jan 15, 2026, 06:15 PM UTC

A fix has been deployed and now Phrase Portal is accessible. We continue to monitor performance.
monitoring Jan 15, 2026, 06:15 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Jan 15, 2026, 06:22 PM UTC

We are continuing to monitor for any further issues.
resolved Jan 15, 2026, 06:50 PM UTC

This incident has been resolved.
postmortem Jan 23, 2026, 04:07 PM UTC

### Introduction We would like to share details about an incident that affected the availability of the RawMT UI portal between **4:15 PM and 6:03 PM UTC** on January 15, 2026. During this time, users were unable to access the portal. The disruption was caused by a capacity limit being exceeded in our routing infrastructure. This post-mortem outlines what occurred and the steps we are taking to prevent similar issues in the future. ### Timeline **Jan 15, 2026 @ 4:15 PM UTC** – A high-severity alert was triggered as the RawMT UI portal became inaccessible to users. **Jan 15, 2026 @ 4:20–5:00 PM UTC** – Engineering teams identified that the issue was related to the routing layer reaching a fixed platform limit, preventing service traffic from being properly directed. **Jan 15, 2026 @ 5:03 PM UTC** – A mitigation was applied, re-routing affected services through alternate infrastructure. **Jan 15, 2026 @ 6:03 PM UTC** – Service was confirmed fully restored and the incident was marked as stable. ### Root Cause The issue occurred when the platform’s traffic routing infrastructure reached a fixed capacity limit on the number of active service associations it could manage. Once this limit was exceeded, configuration updates failed, preventing traffic from being correctly routed to certain services. Although application services remained operational, they became unreachable to users because there were no valid routing paths available. This resulted in a complete outage of the RawMT UI portal. ### Actions to Prevent Recurrence 1. **Improved Alerting** Alert thresholds for routing capacity have been adjusted to ensure earlier visibility and to elevate urgency before reaching critical limits. 2. **Service Distribution Across Infrastructure** Services have been actively rebalanced across multiple routing layers to reduce pressure on any single entry point. 3. **Planned Architecture Improvements** Work is underway to evaluate longer-term changes to the routing architecture to better support scale and reduce the risk of reaching fixed limits in the future.