KnowBe4 incident

PhishER Not Loading (US)

Critical Resolved View vendor source →

KnowBe4 experienced a critical incident on November 24, 2025 affecting Console, lasting 1h 20m. The incident has been resolved; the full update timeline is below.

Started
Nov 24, 2025, 03:50 PM UTC
Resolved
Nov 24, 2025, 05:11 PM UTC
Duration
1h 20m
Detected by Pingoru
Nov 24, 2025, 03:50 PM UTC

Affected components

Console

Update timeline

  1. investigating Nov 24, 2025, 03:50 PM UTC

    We have received reports that the US instance of PhishER is not loading. We are investigating this issue and will update this page when we have more information.

  2. monitoring Nov 24, 2025, 04:02 PM UTC

    We’ve implemented a fix and are monitoring the results to make sure no further issues occur.

  3. resolved Nov 24, 2025, 05:11 PM UTC

    This incident has been resolved.

  4. postmortem Dec 17, 2025, 05:16 PM UTC

    **Incident Summary** On November 24, 2025, PhishER experienced a service disruption affecting the ability of administrators in the US region to log in to the platform. The issue was identified as a database locking event caused by a deployment of a new column. The service was restored by terminating the locking process, and a code revert was subsequently performed to ensure stability. **Timeline of Events** Our monitoring and alerting systems notified engineers of a potential anomaly at approximately 15:30 UTC, prompting an immediate investigation. Shortly thereafter, we received reports that PhishER administrators in the US region were unable to log in. By 15:40 UTC, the engineering team identified the source of the issue: a database deployment job had stalled. Further investigation by engineers revealed that a specific operation to add a new column to the "accounts" table was causing a lock. The database team terminated the stuck process, and by 15:51 UTC, access to the PhishER console was fully restored, allowing users to log in successfully. Following the restoration of service, engineers confirmed that no other regions were impacted. A code revert was completed at approximately 16:30 UTC to prevent recurrence. **Detailed Timeline \(UTC\)** Time | Event Description ~15:30 | Monitoring and alerting systems detected a potential anomaly regarding login latency. 15:35 | Customer support received initial reports that PhishER administrators in the US region were unable to log in. 15:40 | The engineering team identified the source of the issue: a background database deployment job had stalled. 15:45 | Database engineers confirmed a specific operation on the "accounts" table was causing a lock, preventing user sessions. 15:51 | The database team terminated the stuck process. Access to the PhishER console was immediately restored. 16:00 | Engineers verified that service health had returned to normal levels and confirmed no other regions were impacted. ~16:30 | A code revert of the deployment was successfully deployed to prevent the issue from recurring. ‌ **Root Cause** The root cause of this incident was a database deployment script that attempted to add a new boolean column \(advanced\_mode\_enabled\) with a default value to the "accounts" table. This operation resulted in a database lock due to contention, preventing legitimate user sessions from being established during the deployment attempt. **Findings and Mitigations** The investigation concluded that the disruption was triggered by a database deployment script intended to add a boolean column \(advanced\_mode\_enabled\) with a default value to the "accounts" table. Because this table is used frequently, the alteration caused high contention, resulting in a database lock that queued and subsequently timed out legitimate user sessions. **Mitigations** * Immediate Action: The database team manually terminated the locked database query. This action immediately released the lock, allowing queued login requests to process and restoring service availability. * Stabilization: To ensure the system remained stable and to prevent the deployment from automatically retrying and causing further locking, a code revert was issued and deployed. **Preventive Measures** * Maintenance Window Scheduling: The specific database change required for this feature has been rescheduled to occur during a designated off-hours maintenance window, ensuring zero impact on active user traffic. **Conclusion** This incident was caused by database contention during a deployment, which temporarily impacted PhishER login availability in the US region only. Full service stability was restored by reverting the changes. We have implemented enhanced protocols for scheduling high-impact database operations during maintenance windows. These measures are designed to prevent recurrence and ensure the continued reliability of the platform. ‌ **Glossary of Technical Terms** * Database Lock: A safeguard mechanism that temporarily prevents multiple processes from accessing or modifying the same data simultaneously. In this incident, the lock unintentionally prevented the login service from reading user account data. * Code Revert: The process of undoing specific changes in the software codebase to return the system to a previously known stable state. * Boolean Column: A specific type of data field used to store True or False values. The deployment in question attempted to add a "flag" \(True/False\) to enable a new feature for accounts. * Maintenance Window: A pre-scheduled period of time designated for performing technical updates or changes, minimizing the risk of unplanned disruption to users.