Memsource incident

Performance Disruption of all Phrase TMS (EU) components between September 22, 2024 02:25 AM CEST and September 22, 2024 05:01 AM CEST

Critical Resolved View vendor source →

Memsource experienced a critical incident on September 23, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Sep 23, 2024, 06:28 AM UTC
Resolved
Sep 22, 2024, 12:30 AM UTC
Duration
Detected by Pingoru
Sep 23, 2024, 06:28 AM UTC

Update timeline

  1. resolved Sep 23, 2024, 06:28 AM UTC

    This incident has been resolved.

  2. postmortem Oct 11, 2024, 01:03 PM UTC

    ### **Introduction** We would like to share more details about the events that occurred with Phrase between 02:25 AM CEST and 05:01 AM CEST on September, 2024 which led to a gradual outage of all Phrase components and what Phrase engineers are doing to prevent these issues from reoccurring. ### **Timeline** 02:25 AM CEST: Storage space started rapidly filling. 02:38 AM CEST: The first set of alerts was triggered 03:11 AM CEST: The first logged connectivity issue occurred \(some customers may have started experiencing connectivity issues at this point\). 03:30 AM CEST: Storage space was cleaned in the main partition on all nodes. 04:19 AM CEST: Storage space was cleaned in the second partition on all nodes. 04:58 AM CEST: Explicit restarts of the web nodes began. 05:01 AM CEST: Connectivity was fully recovered after the load was balanced across additional servers. ### **Root Cause** During penetration testing, the specific requests to the CORS processor triggered an infinite loop within a web component. This resulted in excessive log generation which eventually consumed all available storage space across all nodes. ### **Actions to Prevent Recurrence** * Implement safeguards to prevent infinite loops in the CORS processor. * Update the bug bounty scope to limit testing to specific servers, reducing the likelihood of widespread impact in the future.