Reown incident

Partial outage of Blockchain API (RPC)

Major Resolved View vendor source →

Reown experienced a major incident on November 22, 2023 affecting Blockchain API, lasting 30m. The incident has been resolved; the full update timeline is below.

Started
Nov 22, 2023, 10:04 AM UTC
Resolved
Nov 22, 2023, 10:34 AM UTC
Duration
30m
Detected by Pingoru
Nov 22, 2023, 10:04 AM UTC

Affected components

Blockchain API

Update timeline

  1. investigating Nov 22, 2023, 10:04 AM UTC

    We are currently investigating this issue.

  2. identified Nov 22, 2023, 10:22 AM UTC

    We _think_ we found the culprit and are rolling back.

  3. resolved Nov 22, 2023, 10:34 AM UTC

    Fix was deployed. We will publish the postmortem shortly

  4. postmortem Nov 23, 2023, 04:49 AM UTC

    **TL;DR** On Nov 21 12pm CET to Nov 22 10am CET the blockchain API was partially down. Later that day, when remediating the incident, it was down for another hour. **Summary** Customers found the issue in both cases and we were internally alerted. **Root Cause** On Nov 20 we [had a partial outage](https://status.walletconnect.com/incidents/wj64kq49ldd7) where an RPC provider didn’t handle errors on the HTTP but JSON RPC level. One Postmortem follow up was to add logging to determine which other RPC providers do this so we prevent this issue in the future. The logic for this ended up being flawed and resulted in an internal `WARN` but failed requests even with out responding HTTP. The root cause for the issue was in the response parsing. After rolling back we rolled out the fix again, this time missing that the wrong `content-type` was set on the response breaking clients from properly reading the response. **What could we have done better?** * This change shouldn’t have made it to production * We should have discovered this issue faster \(our alarming is on the HTTP level but we weren’t responding HTTP here\) * We should have been alerted to both issues before customers found out **Action items** 1. Make sure other issues of this kind respond HTTP @Chris Smith 2. ~~Extend integration tests to cover this type of request~~ 3. Find a bug fix for the parsing and install the logs again ✅ 4. Use RPC in Canary so we are alerted to such issues before customers find out e.g. an e2e UI canary for web3modal 5. Ensure integration tests check the `content-type` of the response