imgix incident

Elevated rendering errors

Minor Resolved View vendor source →

imgix experienced a minor incident on May 23, 2024 affecting Rendering Infrastructure, lasting 6m. The incident has been resolved; the full update timeline is below.

Started
May 23, 2024, 07:44 PM UTC
Resolved
May 23, 2024, 07:50 PM UTC
Duration
6m
Detected by Pingoru
May 23, 2024, 07:44 PM UTC

Affected components

Rendering Infrastructure

Update timeline

  1. monitoring May 23, 2024, 07:44 PM UTC

    On May 23rd, 19:19 UTC, we identified an issue affecting our rendering services due to a caching problem. This caused elevated rendering times and intermittent failures for some users. Our engineering team quickly diagnosed the issue and implemented a fix at 19:36 UTC. We are monitoring the system closely to ensure stability and confirm that the issue has been fully resolved. We appreciate your patience and understanding during this time.

  2. resolved May 23, 2024, 07:50 PM UTC

    This incident has been resolved.

  3. postmortem May 24, 2024, 08:50 PM UTC

    # **Postmortem** # **What happened?** On May 23, 2024, at 19:23 UTC, an increased load on the rendering infrastructure was detected. Actions were taken to scale out our system to handle the additional traffic. This incident was resolved at 19:36. # **How were customers impacted?** During the incident, customers experienced increased error rates for recent renders, intermediate errors increased in our system, and response times for requests increased. # **What went wrong during the incident?** During the incident, our team implemented a service change that led to assets being dropped. This led to an increase in requests to our system. The increased requests to our system led to `429` and `5XX` errors. # **What will imgix do to prevent this in the future?** To prevent similar incidents, we will: * Improve procedures for pre-scaling instances during critical updates. * Conduct impact assessments before issuing significant changes. * Enhance monitoring and alerting systems to predict and manage load increases better.