Cheqroom incident

Login issues

Major Resolved View vendor source →

Cheqroom experienced a major incident on February 24, 2020 affecting Cheqroom App and Mobile apps, lasting 23h 27m. The incident has been resolved; the full update timeline is below.

Started
Feb 24, 2020, 01:10 PM UTC
Resolved
Feb 25, 2020, 12:38 PM UTC
Duration
23h 27m
Detected by Pingoru
Feb 24, 2020, 01:10 PM UTC

Affected components

Cheqroom AppMobile apps

Update timeline

  1. identified Feb 24, 2020, 01:10 PM UTC

    The issue has been identified and a fix is being implemented.

  2. identified Feb 24, 2020, 07:42 PM UTC

    We are continuing to work on a fix for this issue.

  3. monitoring Feb 24, 2020, 09:44 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. monitoring Feb 24, 2020, 11:07 PM UTC

    The service has returned to normal. User invites are still experiencing issues in loading

  5. monitoring Feb 25, 2020, 06:49 AM UTC

    All systems have been back online since our last update. All data is safe and restored. If you have any issues logging in you can reach our support staff at [email protected].

  6. resolved Feb 25, 2020, 12:38 PM UTC

    This incident has been resolved.

  7. postmortem Feb 25, 2020, 02:23 PM UTC

    As many of your will have noticed, we suffered a serious incident on February 24th when rolling out a major change to our software. What was meant as a silent release of our new feature "Customizable User Roles" ended in several hours of downtime of CHEQROOM for a majority of our users. Here's what happened: In our attempt to silently roll out a greatly improved version of those user roles \(making them dynamic and much more customizable\), we encountered several concurrent problems. The first problem occurred in the service that maps old roles to new roles; while accidentally opening up the management of those roles in our web application. The immediate impact of this was that some groups of users were put on unexisting role and thus unable to access the service. The second problem arose at the same time. Our second cluster that provides the backup for peoples' original user roles failed as well. Getting to a stable situation was complex, time consuming and far from straightforward. During that period, CHEQROOM was effectively unusable for a portion of our users. We will perform a complete post-mortem on what happened to make sure that we've completely understood the root causes. It will allow us to put in place the necessary measures to minimize the chances of such interruptions in the future. We're sorry for the inconvenience.