Fasterize experienced a minor incident on October 19, 2023 affecting Acceleration, lasting 15h 17m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 19, 2023, 04:04 PM UTC
We currently have some issues on our european infrastructure. Being fixed. Slight impact on acceleration. Some pages can have some slowdowns. Some optimizations are disabled.
- identified Oct 19, 2023, 04:35 PM UTC
We have mitigated the issue. Performance is back to normal. Still investigating for the root cause.
- monitoring Oct 19, 2023, 04:49 PM UTC
We're monitoring the results but everything's fine. Seems to be related to a schema change in a storage component (to be confirmed after the RCA).
- resolved Oct 20, 2023, 07:22 AM UTC
This incident has been resolved at 18h25 (Paris time). A post mortem will follow.
- postmortem Oct 23, 2023, 09:21 PM UTC
# Description On Thursday, October 19th, between 4:55 PM UTC\+2 and 6:25 PM UTC\+2, Fasterize european platform was unable to optimize web pages for all customers. The original version was then delivered. We discovered that between 4:45 PM UTC\+2 and 5:50 PM UTC\+2, a specific request was made that caused a failure in the Fasterize engine during optimization and left the process in a non-functional state. The number of functional processes then decreased until it fell below a critical threshold. Our engine then automatically switched to a degraded mode where pages were no longer optimized and served without delay. At 5:29 PM UTC\+2, the oncall team manually added capacity to the platform to return to a stable state, but this did not definitely improve the situation. Starting from 6:15 PM UTC\+2, the optimization processes gradually resumed traffic. The engine then returned to its normal mode of operation. To prevent any further incidents, the request has been excluded from optimizations and a fix on the optimization engine is being developed. ## Action plan **Short term:** * Fix the engine to optimize the responsible request without any crashes **Medium term:** * Review the health check system at the engine level to automatically restart non-functional processes