Fluid Attacks incident

Platform, Agent and API are down

Notice Resolved View vendor source →

Started: Mar 31, 2026, 05:13 PM UTC
Resolved: Mar 31, 2026, 07:59 PM UTC
Duration: 2h 45m
Detected by Pingoru: Mar 31, 2026, 05:13 PM UTC

Affected components

PlatformAgent

Update timeline

identified Mar 31, 2026, 05:13 PM UTC

An issue was found in the platform, agent, and API Click for details: https://availability.fluidattacks.com
identified Mar 31, 2026, 05:39 PM UTC

We are continuing to work on a fix for this issue.
resolved Mar 31, 2026, 07:59 PM UTC

This incident has been resolved and all 3 components are now operational.
postmortem Apr 06, 2026, 10:55 PM UTC

### Impact At least 5 client organizations experienced complete unavailability of the platform at `app.fluidattacks.com`. In addition to the main platform, 2 companion services \(AI SAST and the conversational AI assistant\) were equally unavailable, as they share the same server infrastructure and container image format. The issue started on UTC-5 26-03-31 12:07 and was reactively discovered 1 minute \(TTD\) later by an internal monitoring tool that noticed that services were failing. The problem was resolved in 2.7 hours \(TTF\), resulting in a total window of exposure of 2.7 hours \(WOE\). ### Cause The platform became unavailable because the application servers could not start. A component responsible for running our application containers — managed by Amazon as part of our cloud server infrastructure — was automatically updated to a new version that introduced stricter compatibility rules. This update was not announced as a breaking change and was applied to our servers without intervention on our part, as the server image version had never been pinned during the cluster's entire lifetime. Our servers recycle automatically every 24 hours and, upon recycling, adopt the latest available server image. Starting on March 19, nodes began adopting the updated image, which included a container runtime incompatible with a configuration pattern used in our images. The incompatibility went undetected for 12 days because no alert existed for pod initialization failures in this specific failure mode [\[1\]](https://github.com/awslabs/amazon-eks-ami/pull/2653) [\[2\]](https://github.com/containerd/containerd/issues/12683). ### Solution Two categories of fixes were applied to restore service and prevent recurrence: **Immediate remediation:** The image construction process was updated to eliminate the incompatible file structure — replacing the problematic internal links with direct file copies that are compatible with any version of the container runtime [\[3\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99443) [\[4\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99450) [\[5\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99456). A secondary update propagated this fix to the platform's main service [\[6\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99447) [\[7\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99451). Service was fully restored at UTC-5 26-03-31 14:55. **Preventive controls:** The server image version is now pinned, meaning future Amazon updates will only apply when we explicitly choose to adopt them after compatibility verification [\[8\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/99483). Container images are now identified by a unique version identifier per code commit, enabling immediate rollback if a future issue arises. Unnecessary dependencies between deployment pipeline steps were also removed to prevent unrelated failures from blocking production deployments. ### Conclusion This incident was caused entirely by an upstream, third-party change that arrived through an automatic infrastructure update channel we had not controlled. No internal code change, deployment, or configuration modification triggered the failure. The 12-day gap between the server image adoption \(March 19\) and detection \(March 31\) reflects a missing alert for this class of failure. The corrective actions now in place — pinned server image versions, runtime-agnostic container images, and commit-tagged image identifiers — close the primary vectors that allowed this incident to occur and to persist undetected. Ongoing work to simplify the platform's operational infrastructure aims to reduce the number of moving parts that could interact unexpectedly in future upstream changes. **INFRASTRUCTURE\_ERROR < MISSING\_ALERT < THIRD\_PARTY\_CHANGE**

Looking to track Fluid Attacks downtime and outages?

Pingoru polls Fluid Attacks's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Fluid Attacks reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Fluid Attacks alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Fluid Attacks for free

5 free monitors · No credit card required