Reown incident

AppKit Email Login Currently Unavailable

Major Resolved View vendor source →

Reown experienced a major incident on September 16, 2024 affecting Cloud App and AppKit, lasting 7h 11m. The incident has been resolved; the full update timeline is below.

Started
Sep 16, 2024, 03:38 AM UTC
Resolved
Sep 16, 2024, 10:50 AM UTC
Duration
7h 11m
Detected by Pingoru
Sep 16, 2024, 03:38 AM UTC

Affected components

Cloud AppAppKit

Update timeline

  1. identified Sep 16, 2024, 03:38 AM UTC

    The Email Login functionality of AppKit is currently down due to a downstream service being down. All other WalletConnect systems including the Relay are not affected AppKit Social login still works We will update here

  2. identified Sep 16, 2024, 05:32 AM UTC

    Cloud App is also affected by Postmark outage. We are unable to send signup/password reset emails.

  3. monitoring Sep 16, 2024, 06:59 AM UTC

    We've temporarily switched SMTP providers for Postmark. We are monitoring the situation with Postmark to switch back. But all systems are operational again

  4. resolved Sep 16, 2024, 10:50 AM UTC

    This incident has been resolved.

  5. postmortem Sep 16, 2024, 10:50 AM UTC

    ### **TL;DR** On September 16, 2024 from 5am Singapore to 137pm Singapore AppKit Embedded Wallet Email login & [cloud.walletconnect.com](http://cloud.walletconnect.com) email delivery were broken due to an outage of Postmark, an email delivery service. We don’t know the exact numbers of customers affected but assume at least dozens. The issue was reported: * AppKit Email: reported by an **internal** AppKit user at ~9am Singapore * Cloud: reported via Twitter at 11:09am Singapore ### **Summary** The issue started at 5am Singapore. An internal user reported at 9:46am Singapore. An operator started investigating at 11:07am Singapore and reproduced the issue. The operator suspected Magic, the key management service/authentication layer backing the AppKit Wallet, would be at fault. Operator paged Magic in Slack providing evidence that it doesn’t look like Postmark. At 11:32am Magic provided evidence that it appears that the issue is constrained to Postmark. Operator made an account with Sendgrid, an alternative mailing provider, but got blocked by their fraud detection for unknown reasons and was unable to proceed. At 1:38pm operator noticed that they could disable the custom SMTP provider and rely on Magic’s email provider which fails over to Sendgrid. Around the same time another operator switched Cloud over to Supabase mailing instead of Postmark. The other operator created a Sendgrid account as well and switched Cloud to Sendgrid as Cloud was getting rate limited by Supabase. At ~430pm the second Sendgrid account also got blocked. At 640pm Singapore the Magic configuration was switched back to Postmark such that the sender of emails would appear as `@walletconnect.com` again. ### **Root Cause** The root cause was Postmark’s SSL certificate expiring at 5am Singapore. ### **5 Whys** 1. Why did the AppKit Email / Cloud Signup not work? Because emails were not delivered. 1. Why were the emails not being delivered? Because Postmark, the outgoing email service we use for both platforms, had an outage. 1. Why was the outage not discovered faster We don’t execute email login on either Cloud or AppKit as a Canary flows. The Canary flows we have don’t exercise sign up \(Cloud\) or email login \(AppKit\). 1. Why did the remediation take ~2h after the initial report? The operator was not aware that disabling the custom SMTP provider setting was an option. 1. Why was the operator not aware of this option? The operator should have asked Magic - who were helping to remediate - if they have ideas of how to resolve this quicker. ### **What could we have done better?** 1. Discovery: we could have automatically detect both Cloud Login/AppKit Email being down through the use of Canaries 2. Remediation: we could have failed over quicker to non-custom-SMTP quicker 3. Previous outage follow up: we could have already had a Sendgrid account after the end-of-July outage of Postmark where they didn’t win trust. ### How can we prevent this from happening again? Have a Sendgrid account ready for redundancy or even investigate automatic failover. ### **Action items** 1. Short-term: set up Sendgrid account for backup @Derek Rein 2. Mid-term: contemplate covering email flows in Canaries 1. Cloud: @Cali Armut 2. AppKit: @Tomas Rocchi # Links [https://status.postmarkapp.com/notices/5jmmv4cyfqboak2v-service-issue-outbound-smtp-sending-issues](https://status.postmarkapp.com/notices/5jmmv4cyfqboak2v-service-issue-outbound-smtp-sending-issues)