Zephyr Scale incident

Zephyr Scale is inaccessible from the UI

Critical Resolved View vendor source →

Zephyr Scale experienced a critical incident on February 2, 2023 affecting Zephyr Cloud (US), lasting 25m. The incident has been resolved; the full update timeline is below.

Started
Feb 02, 2023, 11:19 AM UTC
Resolved
Feb 02, 2023, 11:44 AM UTC
Duration
25m
Detected by Pingoru
Feb 02, 2023, 11:19 AM UTC

Affected components

Zephyr Cloud (US)

Update timeline

  1. identified Feb 02, 2023, 11:19 AM UTC

    The issue has been identified and a fix it's being deployed at the moment.

  2. resolved Feb 02, 2023, 11:44 AM UTC

    This incident has been resolved.

  3. postmortem Feb 02, 2023, 05:27 PM UTC

    ## Cause A defect has been introduced in a production deployment that unfortunately has not been caught by the automated testing pipeline since the issue affected only the production environment. This error has broken the authentication mechanism of the application, blocking user access to all backend services. ## Impact Although the services continued running normally, the authentication mechanism prevented all users from accessing the application via the UI. The API has not been affected and continued to operate normally. No data has been affected. ## Solution The team has reverted the deployment to a previous versions. Then, the defect has been fixed and a new deployment took place. ## Actions Improvements have been identified for the automation pipeline. New mechanisms to detect production issues right after new deployments will be put in place, so that similar situations can be reverted much quicker.