Convex incident

Convex traffic having downtime

Major Resolved View vendor source →

Convex experienced a major incident on September 20, 2025, lasting 57m. The incident has been resolved; the full update timeline is below.

Started
Sep 20, 2025, 12:18 PM UTC
Resolved
Sep 20, 2025, 01:15 PM UTC
Duration
57m
Detected by Pingoru
Sep 20, 2025, 12:18 PM UTC

Update timeline

  1. investigating Sep 20, 2025, 12:37 PM UTC

    We are currently investigating this issue.

  2. monitoring Sep 20, 2025, 12:57 PM UTC

    An unexpected traffic pattern overloaded some of our services, causing intermittent unavailability across Convex instances. We've added extra capacity and are monitoring to ensure that the system is stable.

  3. resolved Sep 20, 2025, 01:15 PM UTC

    This incident has been resolved.

  4. postmortem Sep 20, 2025, 08:42 PM UTC

    From around 5:18am to 5:54am Pacific \(12:18pm to 12:54pm UTC\), Convex had a 36 min period of intermittent downtime that affected all Convex services. The specific issue was a cascading failure in our traffic layer. We had a traffic node \(Caddy\) run out of memory due to an unforeseen load spike and instead of just being restarted/replaced this node was marked as permanently down by our container management layer \(Nomad\) which led to the issue propagating to all traffic servers. Since the incident we've more than doubled the size of our traffic layer, fixed the failover behavior which led to nodes staying failed after OOMing, and will be investigating alternative traffic services. As always data was safe during this incident but we really apologize for the availability impact during that time period.