Roam incident

Trouble connecting to Roam for customers in the Middle East geographic region

Critical Resolved View vendor source →

Roam experienced a critical incident on June 1, 2023, lasting 10m. The incident has been resolved; the full update timeline is below.

Started
Jun 01, 2023, 11:14 AM UTC
Resolved
Jun 01, 2023, 11:25 AM UTC
Duration
10m
Detected by Pingoru
Jun 01, 2023, 11:14 AM UTC

Update timeline

  1. identified Jun 01, 2023, 11:14 AM UTC

    This issue has been identified and a remediation is in place. That remediation involves an update to our DNS, which is configured to cache for 5 minutes, but some DNS systems are configured to ignore the timeouts provided by endpoints and override with larger values. If you are having issues after 11:15 AM GMT please report a bug from Roam menu or send a Team Roam support chat.

  2. resolved Jun 01, 2023, 11:25 AM UTC

    Our logs are indicating that users in this region are able to successfully connect. Will will post a postmortem in the next 24 hours.

  3. postmortem Jun 01, 2023, 08:41 PM UTC

    ## Summary of Impact From 1:07 ET on June 1, 2023 until 7:00 ET Roam meetings were unavailable for users in the Middle East geographic region. ## Cause On the night of May 31st we started to roll out infrastructure in the AWS me-central-1 region \(UAE\) to improve the AV quality and experience of our users in the Middle East. This infrastructure wasn't meant to be turned on, but due to an error was put into use before it was ready. Once that occurred any users in the region would have been unable to connect via AV to other users. ## Remediation Plan 1. We updated our infrastructure code to make a similar problem less likely to occur in the future. 2. We have updated our monitoring and alerting to make us catch issues like these more quickly and reduce the amount of downtime. 3. We will add client based fall back to alternate regions in case the closest region isn't working for any reason.