KnowBe4 incident

KSAT - Unable to access training (All instances)

Critical Resolved View vendor source →

KnowBe4 experienced a critical incident on November 14, 2025 affecting Console and Learner Experience (LX), lasting 7h 6m. The incident has been resolved; the full update timeline is below.

Started
Nov 14, 2025, 04:41 PM UTC
Resolved
Nov 14, 2025, 11:48 PM UTC
Duration
7h 6m
Detected by Pingoru
Nov 14, 2025, 04:41 PM UTC

Affected components

ConsoleLearner Experience (LX)

Update timeline

  1. investigating Nov 14, 2025, 04:41 PM UTC

    We have received reports that users are unable to access training. We are investigating this issue and will update this page when we have more information.

  2. investigating Nov 14, 2025, 04:58 PM UTC

    We have received additional reports that KSAT customer account creation is unavailable. We will continue to investigate this issue and will update this page when we have more information.

  3. monitoring Nov 14, 2025, 05:10 PM UTC

    We’ve implemented a fix and users should be able to login at this time. We will continue to monitor the results to make sure no further issues occur.

  4. resolved Nov 14, 2025, 11:48 PM UTC

    This incident has been resolved.

  5. postmortem Jan 12, 2026, 02:35 PM UTC

    # **External Technical Root Cause Analysis - KSAT - Unable to access training** ‌ This report outlines the detailed findings and mitigations associated with a service interruption impacting the KnowBe4 Learner Experience \(LX\) platform. The interruption was caused by a schema mismatch between the backend database and the frontend application during a deployment intended to remove deprecated feature flags. The impact resulted in users receiving 404 error messages when attempting to access their training or log in to the Learner Experience. ‌ Multiple customers experienced an inability to access the platform, as initially reported on November 14, 2025, at 4:35 p.m. \(UTC\). The issue was identified quickly, mitigated, and resolved on November 14, 2025, at 5:07 p.m. \(UTC\). # WHAT HAPPENED The Learner Experience \(LX\) system relies on a modern frontend architecture that communicates with backend services via GraphQL to retrieve user attributes and configurations. This ensures that learners see the correct training content, risk scores, and interface options based on their organization's settings. ‌ On November 14, 2025, engineering teams executed a planned deployment to clean up the codebase by removing legacy code and feature flags. This change involved removing specific columns from the backend database schema that were deemed obsolete and removing them from all related frontend services. ‌ Immediately following this deployment, monitoring systems and customer support tickets indicated a spike in 404 errors for users attempting to load the Learner Experience. Investigations revealed that, although the backend columns had been successfully removed, the frontend application still retained a dependency on these fields. Specifically, the frontend was still attempting to query the now-deleted columns via GraphQL. This schema mismatch caused the API calls to fail entirely, resulting in the 404 errors presented to end users. # ROOT CAUSE ### **Primary Root Cause: Unsynchronized Deprecation** The backend deployment removed a database column that the frontend application was still actively querying. The deployment of the backend change preceded the removal of the frontend dependency, resulting in a breaking change to the GraphQL schema contract. ### **Secondary Root Cause: Post-Deployment Verification Latency** While automated smoke tests successfully identified the failure, they are configured to run immediately after deployments to production. Consequently, the invalid state was already live and impacting users by the time the test results confirmed the failure. # FINDINGS AND MITIGATIONS **Frontend Dependency on Deleted Schema** * **Finding:** The frontend application requests a specific field, which was mapped to the related column that was removed in the backend deployment. * **Mitigation:** Identified the failing GraphQL query and deployed a hotfix to remove the mapped column reference from the Learner Experience service. This hotfix restored login access for all users. **Observability and Alerting** * **Finding:** The issue was detected rapidly via "Sev0" alerts and support tickets. The post-deployment smoke tests also failed, confirming the issue, though this occurred concurrently with customer reports. * **Mitigation:** Teams pinpointed the exact deployment and merge request responsible, allowing for a rapid rollback/fix decision. # TECHNICAL DETAILS The Learner Experience frontend utilizes GraphQL to hydrate user sessions with configuration data. The specific failure occurred when the application requested the `$attributes` object, which included the field for `reportsConfiguration.riskScoreV2Enabled`. ‌ When the backend deployment \(aimed at removing these deprecated flags\) completed, the database schema no longer supported these fields. However, the frontend query had not yet been updated to stop requesting them. When the API received a request for fields that no longer existed in the schema definition, it returned a validation error. The application, unable to handle this critical failure during the initial load sequence, defaulted to a 404 error page. # PREVENTIVE MEASURES 1. **Phased Schema Deprecation:** Future removal of database columns will follow a stricter "Deprecate -> Detach -> Remove" lifecycle. The frontend dependency must be removed and verified in production before the backend column is physically dropped. 2. **Dependency Audits:** We will implement stricter code search practices during cleanup initiatives to ensure all references \(frontend, backend, and reporting\) to a deprecated field are identified prior to removal. # CUSTOMER IMPACT AND RECOVERY On November 14, 2025, for a short period, customers attempting to access the Learner Experience or complete training received 404 errors, making it temporarily unavailable. Following the deployment of the hotfix, user access to the Learner Experience was restored quickly. No data was lost during this incident, and full operational integrity was confirmed shortly after. # CONCLUSION This incident highlights the critical importance of synchronized management between frontend dependencies and backend schema changes. While the intention to reduce technical debt by removing legacy flags was correct, the execution sequence caused a temporary service disruption. As a result of this RCA: * The problematic dependencies have been fully removed. * We have successfully cleaned up the legacy code, preventing future confusion or issues related to these fields. We are committed to ensuring the reliability of the Learner Experience and appreciate your patience as we resolved this matter.