imgix incident

We are investigating increased latency for first time renders in EU

Minor Resolved View vendor source →

imgix experienced a minor incident on October 29, 2025 affecting Rendering Infrastructure, lasting 1h 12m. The incident has been resolved; the full update timeline is below.

Started
Oct 29, 2025, 11:34 AM UTC
Resolved
Oct 29, 2025, 12:46 PM UTC
Duration
1h 12m
Detected by Pingoru
Oct 29, 2025, 11:34 AM UTC

Affected components

Rendering Infrastructure

Update timeline

  1. investigating Oct 29, 2025, 11:34 AM UTC

    We are currently investigating this issue.

  2. identified Oct 29, 2025, 11:56 AM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Oct 29, 2025, 12:25 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Oct 29, 2025, 12:46 PM UTC

    This incident has been resolved.

  5. postmortem Nov 11, 2025, 12:18 AM UTC

    # **Summary** Between October 21 and October 29, customers in Europe experienced 3 separate periods of increased latency for rendering requests. In a small number of cases, requests temporarily failed with “429 – concurrency limit reached” responses. # **What Went Wrong** The incident was traced to a GPU scaling issue from one of our upstream infrastructure providers. This led to temporary slowdowns and under higher-than-usual load. ### **Timeline** * **October 21:** Increased rendering latency in EU region, self-resolved. Investigation traced issue to GPU scaling in upstream infrastructure. Mitigation prepared. * **October 27:** Issue recurred. Manual mitigation deployed to stabilize rendering and automate future handling. * **October 29:** Latency alert triggered again. Previous fix mitigates impact, but latency becomes intermittent; additional configuration changes implemented to fully restore service and prevent recurrence. # What we will do to prevent this in the future While the new configurations will prevents recurring incidents, we are making further improvements to rendering resiliency and recovery speed: * Added more GPU hardware types to reduce the risk of scaling delays during peak demand. * Testing and evaluating additional hardware configurations to improve resiliency. * Finalizing fine-tuning of current configurations and exploring cross-regional load-balancing capabilities to further strengthen reliability. * Adjusted alerting thresholds to provide earlier notification of emerging issues.