Front experienced a critical incident on December 19, 2025 affecting App, lasting 3h 53m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Dec 19, 2025, 03:13 PM UTC
We are currently investigating the issue. [us-west-1], [us-west-2]
- investigating Dec 19, 2025, 04:06 PM UTC
We are continuing to investigate this issue.
- identified Dec 19, 2025, 04:47 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Dec 19, 2025, 05:29 PM UTC
A fix has been implemented and we are monitoring the results. We are seeing recovery but continuing to close out remaining recovery items.
- monitoring Dec 19, 2025, 05:57 PM UTC
We are continuing to monitor and close our our remaining recovery items for [us-west-2] customers. Other regions are recovered.
- monitoring Dec 19, 2025, 06:23 PM UTC
Front is operational for all customers. We are continuing to backfill any missed messages and application webhooks.
- resolved Dec 19, 2025, 09:23 PM UTC
All backfills complete
- postmortem Dec 19, 2025, 11:21 PM UTC
On Friday, Dec 19, at 14:50 UTC \(6:50am PST\), customers backed in our US-West-2 data center experienced dramatically increased API latency, resulting in the website failing to load and messages being queued in the backend. This continued until 18:10 UTC \(10:10am PST\). During this time no messages were lost, though there may have been a significant delay for messages to appear in customer inboxes. All queued messages were delivered by 21:00 \(1:00pm PST\). Customers based in Front’s EU-West-1 and US-West-1 datacenters may have experienced some delays during this time, as some systems are interdependent, but this impact was intermittent and uncommon. The root cause of this issue was the failure of a caching system. There are several database systems that support the Front application, which are supported by caching to improve performance. A recent change increased the size of some objects in the cache layer. This is not inherently wrong, and did not have any immediate impact. On Friday the 19th the caching layer in US-West-2 crossed a new threshold of data volume which triggered a large number of evictions, particularly of other data that is necessary for most application activity. Besides putting additional load on the databases, there was simply not enough room in the cache for all the data we needed to store there. This caused a high amount of thrashing that significantly increased latency for all systems.