Fasterize incident

Platform has been unavailable

Critical Resolved View vendor source →

Fasterize experienced a critical incident on May 13, 2025 affecting Acceleration, lasting 58m. The incident has been resolved; the full update timeline is below.

Started
May 13, 2025, 07:54 PM UTC
Resolved
May 13, 2025, 08:53 PM UTC
Duration
58m
Detected by Pingoru
May 13, 2025, 07:54 PM UTC

Affected components

Acceleration

Update timeline

  1. investigating May 13, 2025, 07:54 PM UTC

    One of our datacenter has been unavailable between 21h21 and 21h47. We are investigating the incident.

  2. resolved May 13, 2025, 08:53 PM UTC

    We experienced a service disruption caused by a Distributed Denial of Service (DDoS) attack. The issue has now been resolved. A full post-mortem will follow. Thank you for your patience and understanding.

  3. postmortem May 15, 2025, 03:04 PM UTC

    ## **Summary** On May 14, 2025, Fasterize experienced a partial service disruption affecting a subset of customers. The issue was caused by a large-scale DDoS attack targeting a website accelerated by our platform. The incident lasted approximately 25 minutes, with service fully restored at 21:47. ## **Timeline \(UTC\+2\)** * **21:22 – 21:30**: Our systems registered an abnormally high volume of requests — over 37 million in total, peaking at 350,000 requests per second. * **21:47**: Traffic stabilized and all services were back to normal. ## **What Happened** The DDoS attack overwhelmed several load balancers, leading to repeated restarts. Under normal circumstances, our failover system automatically routes traffic directly to the origin servers if a platform zone becomes unhealthy. However, the DNS health checks tied to certain zones were misconfigured. They continued to report the zone as healthy despite the outage, preventing failover from triggering correctly. ## **Impact** * **Severity Level:** 1 \(Unplanned downtime affecting multiple production websites\) * **Detection time:** 12 minutes * **Time to full recovery:** 25 minutes ## **What We're Doing** ### **Immediate fixes** * Corrected the failover configuration to ensure accurate health checks. ‌ ### **Short-term improvements** * Tuned load balancer settings for better resilience under high traffic. * Improved alerting on health check anomalies. ### **Medium-term improvements** * Increasing infrastructure redundancy to distribute traffic more effectively. * Evaluating native rate-limiting solutions to mitigate volumetric attacks.