imgix experienced a minor incident on May 23, 2024 affecting Rendering Infrastructure, lasting 6m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring May 23, 2024, 07:44 PM UTC
On May 23rd, 19:19 UTC, we identified an issue affecting our rendering services due to a caching problem. This caused elevated rendering times and intermittent failures for some users. Our engineering team quickly diagnosed the issue and implemented a fix at 19:36 UTC. We are monitoring the system closely to ensure stability and confirm that the issue has been fully resolved. We appreciate your patience and understanding during this time.
- resolved May 23, 2024, 07:50 PM UTC
This incident has been resolved.
- postmortem May 24, 2024, 08:50 PM UTC
# **Postmortem** # **What happened?** On May 23, 2024, at 19:23 UTC, an increased load on the rendering infrastructure was detected. Actions were taken to scale out our system to handle the additional traffic. This incident was resolved at 19:36. # **How were customers impacted?** During the incident, customers experienced increased error rates for recent renders, intermediate errors increased in our system, and response times for requests increased. # **What went wrong during the incident?** During the incident, our team implemented a service change that led to assets being dropped. This led to an increase in requests to our system. The increased requests to our system led to `429` and `5XX` errors. # **What will imgix do to prevent this in the future?** To prevent similar incidents, we will: * Improve procedures for pre-scaling instances during critical updates. * Conduct impact assessments before issuing significant changes. * Enhance monitoring and alerting systems to predict and manage load increases better.