Convex experienced a major incident on September 20, 2025, lasting 57m. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating Sep 20, 2025, 12:37 PM UTC
We are currently investigating this issue.
- monitoring Sep 20, 2025, 12:57 PM UTC
An unexpected traffic pattern overloaded some of our services, causing intermittent unavailability across Convex instances. We've added extra capacity and are monitoring to ensure that the system is stable.
- resolved Sep 20, 2025, 01:15 PM UTC
This incident has been resolved.
- postmortem Sep 20, 2025, 08:42 PM UTC
From around 5:18am to 5:54am Pacific \(12:18pm to 12:54pm UTC\), Convex had a 36 min period of intermittent downtime that affected all Convex services. The specific issue was a cascading failure in our traffic layer. We had a traffic node \(Caddy\) run out of memory due to an unforeseen load spike and instead of just being restarted/replaced this node was marked as permanently down by our container management layer \(Nomad\) which led to the issue propagating to all traffic servers. Since the incident we've more than doubled the size of our traffic layer, fixed the failover behavior which led to nodes staying failed after OOMing, and will be investigating alternative traffic services. As always data was safe during this incident but we really apologize for the availability impact during that time period.