Fluid Attacks incident

Service disruption due to AWS outage

Fluid Attacks experienced a critical incident on October 20, 2025 affecting Platform and Agent, lasting 2h 16m. The incident has been resolved; the full update timeline is below.

Started: Oct 20, 2025, 08:45 PM UTC
Resolved: Oct 20, 2025, 11:02 PM UTC
Duration: 2h 16m
Detected by Pingoru: Oct 20, 2025, 08:45 PM UTC

Affected components

PlatformAgent

Update timeline

identified Oct 20, 2025, 02:48 PM UTC

An ongoing outage in multiple AWS services is disrupting our platform, the Agent and all API-dependent services. Our engineering team is actively working to mitigate the impact.
identified Oct 20, 2025, 11:00 PM UTC

We are continuing to monitor the incident closely and are actively working to mitigate the impact on our platform and related services.
resolved Oct 20, 2025, 11:02 PM UTC

The incident has been resolved and all our services are now fully operational. AWS services have stabilized, and we are monitoring performance to ensure continued reliability.
postmortem Oct 21, 2025, 08:48 PM UTC

**Impact** Our monitoring tools detected abnormal behavior across all core services, including the platform, agent, and API-dependent components. Although AWS services began experiencing issues at UTC-5 25-10-20 00:11, our platform continued handling requests until approximately UTC-5 25-10-20 09:15 and was proactively discovered 15 minutes \(TTD\) later by one of our monitoring systems, which alerted our team to a service outage affecting several core components. The problem was resolved in 8.4 hours \(TTF\), resulting in a total window of exposure of 8.6 hours \(WOE\). [\[1\]](https://gitlab.com/fluidattacks/universe/-/issues/18386). **Cause** Multiple AWS services experienced a major outage, as reported in the [AWS Health Dashboard](https://health.aws.amazon.com/health/status?path=open-issues). This incident caused disruptions across several AWS components our infrastructure depends on, leading to widespread service failures within Fluid Attacks. **Solution** While AWS services were gradually recovering, our team worked in parallel to implement internal adjustments that allowed our infrastructure to deploy successfully once service stability was restored. These changes focused on ensuring that our components could reconnect, synchronize, and resume normal operation. Most of the downtime corresponded to AWS’s recovery period, followed by an additional time during which we completed the internal redeployment process. **Conclusion** We are actively working on migrating to more self-contained infrastructure stacks to reduce dependency-related issues and improve the overall reliability and reproducibility of our services. **THIRD\_PARTY\_ERROR**