Flow Swiss incident

Testnet Execution is down

Major Resolved View vendor source →

Flow Swiss experienced a major incident on May 23, 2025 affecting Flow Testnet, lasting 5h 24m. The incident has been resolved; the full update timeline is below.

Started
May 23, 2025, 01:17 PM UTC
Resolved
May 23, 2025, 06:42 PM UTC
Duration
5h 24m
Detected by Pingoru
May 23, 2025, 01:17 PM UTC

Affected components

Flow Testnet

Update timeline

  1. investigating May 23, 2025, 01:17 PM UTC

    We are currently investigating this issue.

  2. investigating May 23, 2025, 03:12 PM UTC

    We have identified the fix, we are preparing upgrade on Testnet

  3. resolved May 23, 2025, 06:42 PM UTC

    This incident has been resolved.

  4. postmortem Jun 09, 2025, 03:05 AM UTC

    ## 📅 Incident Summary * **Date**: May 23, 2025 * **Duration**: 6:00 AM – 10:00 AM Pacific Time * **Impact**: Transaction execution on Flow Testnet halted. * **Status**: Resolved ## 🧨 Root Cause A specific transaction submitted to Flow Testnet triggered an unhandled edge case in the Cadence resulting in a complete halt in transaction processing. ## ✅ Resolution & Fixes Several targeted Cadence code fixes were implemented promptly to mitigate the issue: * 🛠️ **Parser fixes** to handle the edge case. * 🛠️ **Error reporting improvements** for better diagnostics. * 🛡️ **Defensive parsing logic** added via [Cadence PR #3974](https://github.com/onflow/cadence/pull/3974) These fixes were deployed through an Height Coordinated Upgrade \(HCU\) on testnet and then subsequently an HCU mainnet on the same day. ## 🧪 Prevention & Follow-Up Actions To avoid similar incidents in the future, the following preventive measures are being implemented: ### **1. Enhanced Fuzz Testing** * Setup regularly run of the `cadence-fuzzer`, a tool designed to generate and test random Cadence programs as part of CI/CD \(Cadence issue [#3985](https://github.com/onflow/cadence/issues/3985)\) ### **2. Consider Pre-execution Simulation \(long-term\)** * Introduce simulation of transactions in a controlled environment \(e.g., Access Node\) before execution on core nodes to catch anomalies early. However, this is a long term strategy and needs to be investigated further.