Demio experienced a minor incident on August 20, 2024 affecting Webinar Room API and Webinar Room WebSocket Server, lasting 9h 11m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 20, 2024, 12:25 AM UTC
We are currently investigating this issue.
- investigating Aug 20, 2024, 02:01 AM UTC
The issue is affecting some old automated events - as a workaround, we recommend creating a brand new automated event until the issue is fixed.
- identified Aug 20, 2024, 08:21 AM UTC
The issue is affecting only automated events with "Private" chat prefrence. We've identified the cause of the issue and are currently testing a fix.
- monitoring Aug 20, 2024, 09:07 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Aug 20, 2024, 09:37 AM UTC
The issue is fixed
- postmortem Aug 21, 2024, 01:07 PM UTC
## **Summary** * Released Webinar Room component code changes caused regression and partial outage for a specific group of users ## **Impact** * Attendees couldn’t join some automated events properly. * Only scheduled automated events with private chat preferences were affected. All other events worked fine. ## **Root Cause Analysis** * The engineering team worked on fixing one bug and unintentionally introduced a code change that caused the given incident. * The QA team didn’t test the new code change properly and missed a new bug in the updated code. The end-to-end automated tests didn’t detect the system behavior change in the Staging environment. * The updated code was deployed to Production, which caused regression. ## **Resolution and Recovery** * The engineering team identified the root cause of the regression and applied a new hotfix that solved the problem. ## **Actions Points** * We will improve our QA end-to-end automatic testing script to cover more combinations of event settings. * We will improve our manual QA testing process to avoid missing such cases. * We will continue improving our codebase to make it more reliable and fault-tolerant to newly introduced changes.