Fluid Attacks incident

Service degradation due to API timeout errors

Minor Resolved View vendor source →

Fluid Attacks experienced a minor incident on November 28, 2025 affecting Platform, lasting 15m. The incident has been resolved; the full update timeline is below.

Started
Nov 28, 2025, 08:45 PM UTC
Resolved
Nov 28, 2025, 09:00 PM UTC
Duration
15m
Detected by Pingoru
Nov 28, 2025, 08:45 PM UTC

Affected components

Platform

Update timeline

  1. identified Dec 01, 2025, 10:34 PM UTC

    It has been identified that a timeout error in one of the core API services caused requests to exceed the expected response time. This issue led to noticeable degradation across the platform.

  2. resolved Dec 01, 2025, 10:35 PM UTC

    The incident has been resolved, and all API services are now operating as expected.

  3. postmortem Dec 01, 2025, 10:54 PM UTC

    **Impact** At least one user experienced difficulties while trying to review vulnerabilities, as the platform failed to load correctly. The issue started on UTC-5 25-11-28 15:23 and was proactively discovered 21 minutes \(TTD\) later by a staff member who reported through our help desk [\[1\]](https://help.fluidattacks.com/agent/fluid4ttacks/fluid-attacks/tickets/details/944043000055824645) that the vulnerabilities view remained stuck in a loading state. Following this initial report, an additional customer report arrived, confirming that the problem was affecting multiple users. The problem was resolved in 10 minutes \(TTF\), resulting in a total window of exposure of 31 minutes \(WOE\). **Cause** A large batch of approximately 3,900 automated tasks was executed against the platform. Each task performed several operations that required intensive use of the system, and up to 200 of them were running simultaneously. This created a sudden and unusually high amount of activity that the platform was not able to handle quickly enough. Because the platform needs several minutes to increase its capacity when activity spikes, it continued receiving more and more requests before it was ready to support them. This led to delays, timeouts, and error responses for about half an hour, affecting both the automated tasks and regular users who were trying to interact with the platform during that period. The incident was not caused by a recent update, but rather by a combination of a huge volume of simultaneous work and the current limitations of the platform’s ability to adapt to sudden increases in demand. **Solution** Stopping the ongoing tasks was not possible because, by the time the cause was clearly identified, most of them had already been submitted and were nearly complete. The situation was monitored closely until the workload naturally decreased. Once the activity level went down, the platform gradually recovered and returned to normal operation. **Conclusion** To prevent similar incidents, we will avoid generating large, highly concurrent workloads against the platform until it is better prepared to support them, adjust internal workflows so the platform receives fewer unnecessary requests, and continue improving its ability to scale more quickly during periods of high demand, ensuring greater stability and a smoother experience for all users. **PERFORMANCE\_DEGRADATION < INCOMPLETE\_PERSPECTIVE**