StopLight incident

Login issue for workspaces using custom domains - Use *.stoplight.io domain for workspace login

Major Resolved View vendor source →

StopLight experienced a major incident on June 9, 2023, lasting —. The incident has been resolved; the full update timeline is below.

Started
Jun 09, 2023, 11:18 PM UTC
Resolved
Jun 07, 2023, 10:00 PM UTC
Duration
Detected by Pingoru
Jun 09, 2023, 11:18 PM UTC

Update timeline

  1. resolved Jun 09, 2023, 11:18 PM UTC

    Stoplight is experiencing an issue where workspaces with custom domains are unable to login via the custom domain URL due to a CORS error. Workspaces are still available for login via their *.stoplight.io domains.

  2. postmortem Jun 09, 2023, 11:18 PM UTC

    ## **Incident Root Cause Analysis and Post-Mortem** ### **Incident Summary:** Incident Date: 7 June 2023 Stoplight experienced an issue where workspaces with custom domains were unable to login via the custom domain URL due to a CORS error. Workspaces were still available for login via their \*.stoplight.io domains. The issue was resolved 9 hours and 35 minutes after initial urgent support case escalation. ### **Timeline of Events:** #### **Wednesday, June 7, 2023** * **14:22 CDT** - Stoplight released a change to production which affected login for workspaces accessed via custom domains * **16:41 CDT -** Initial indication of potential issue based on receipt of non-urgent support case, issue investigation begins * **23:41 CDT -** Initial indication of broader issue impact based on receipt of first Urgent/P1 support case **Wednesday, June 7, 2023** * **08:06 CDT -** Investigation concludes, root cause identified, Stoplight initiates release roll-back process * **09:16 CDT -** Production roll-back completed, issue resolved ### **Root Cause:** * Stoplight experienced an issue where workspaces with custom domains were unable to login via the custom domain URL due to a CORS error. * The issue was not caught before making it into production by Stoplight’s build validation testing and was not immediately caught after release to production by Stoplight’s stack and log monitoring instrumentation. * While Stoplight initiated incident response mechanisms to investigate and address the issue, it was not immediately clear of the broader impact to customer workspaces until urgent support cases began coming in after standard operating hours. ### **Remediation Actions** * Stoplight will be adding non [stoplight.io](http://stoplight.io) and [stoplight-dev.io](http://stoplight-dev.io) custom domains for pre-release development and qa login testing * Stoplight is implementing additional pre-release end to end tests for service-auth CORS operations and custom domains and adding production metrics/alerts for 500s for login URLs * Stoplight is revising incident response mechanisms to better identify broader issue impact and improve time to resolution for non-business-hour urgent incidents.