Labflow incident

Labflow App Outage

Major Resolved View vendor source →

Labflow experienced a major incident on February 10, 2022 affecting Labflow App, lasting —. The incident has been resolved; the full update timeline is below.

Started
Feb 10, 2022, 06:26 PM UTC
Resolved
Feb 10, 2022, 06:26 PM UTC
Duration
Detected by Pingoru
Feb 10, 2022, 06:26 PM UTC

Affected components

Labflow App

Update timeline

  1. resolved Feb 10, 2022, 06:26 PM UTC

    - Labflow App Outage Time Frame: Start: 02/10/2022 11:18:11 AM End: 02/10/2022 11:45:11 AM Duration: 27 minutes Root Cause: Labflow's underlying database service failed. Labflow's engineering team is working with its database vendor to understand why the database service failed. We will update this incident when more information is available. Remedies: The database service was restarted allowing all Labflow services to become operational again. We do not expect future incidents at this point. Student Impact: Students were unable to access Labflow at this time.

  2. postmortem Feb 11, 2022, 10:03 PM UTC

    After further investigation with our database vendor, it was concluded that 2 major factors contributed to this outage. 1. Database queries were executed with an unexpected consumption of RAM that resulted in cluster’s primary node restarting. 2. When the primary node failed, the cluster did not fail-over to the secondary node due to a bug in the vendor's cluster software resulting in the cluster disconnecting from Labflow.