Fluid Attacks incident

Platform indicator errors

Fluid Attacks experienced a notice incident on July 11, 2025 affecting Platform, lasting 2h 26m. The incident has been resolved; the full update timeline is below.

Started: Jul 11, 2025, 03:45 PM UTC
Resolved: Jul 11, 2025, 06:11 PM UTC
Duration: 2h 26m
Detected by Pingoru: Jul 11, 2025, 03:45 PM UTC

Affected components

Platform

Update timeline

identified Aug 11, 2025, 10:33 PM UTC

It was identified that platform indicators were not working correctly, leading to inconsistencies in the displayed values.
resolved Aug 11, 2025, 10:35 PM UTC

The incident has been resolved, and platform indicators are now being calculated and displayed correctly, ensuring consistent and reliable results.
postmortem Aug 11, 2025, 10:36 PM UTC

**Impact** At least one user observed inconsistencies in the platform’s indicators. The issue started on UTC-5 25-07-10 10:21 and was proactively discovered 1 day \(TTD\) later by a staff member who noticed that the system responsible for keeping these indicators up to date was failing, which led to incorrect calculations. The problem was resolved in 2.4 hours \(TTF\), resulting in a total window of exposure of 1.1 days \(WOE\) [\[1\]](https://gitlab.com/fluidattacks/universe/-/issues/16878). **Cause** A large update was made to many records in the system at the same time. This triggered multiple automated processes that calculate and update indicators. Since too many processes were running at once, the system became overloaded and could not finish the calculations, causing it to get stuck in a continuous error loop [\[2\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/80701). **Solution** The number of processes running at the same time was reduced, and each one handles more updates before finishing. This balance helped the system recover and work normally again [\[3\]](https://gitlab.com/fluidattacks/universe/-/merge_requests/80810). **Conclusion** By allowing fewer simultaneous processes and giving each process more work to do, we prevented the system from overloading. We also set up alerts to quickly detect if this issue happens again. These measures were enough to stabilize the platform. **DATA\_QUALITY < PERFORMANCE\_DEGRADATION < INCOMPLETE\_PERSPECTIVE**