Northpass incident
Elevated API Errors - user access login error
Northpass experienced a major incident on September 9, 2025 affecting Northpass App - AWS, lasting 9h 18m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Sep 09, 2025, 05:42 PM UTC
We're experiencing an elevated level of API errors and are currently looking into the issue.
- investigating Sep 09, 2025, 05:48 PM UTC
We are continuing to investigate this issue. Admins and end users are experiencing errors when logging in
- investigating Sep 09, 2025, 05:52 PM UTC
We are continuing to investigate this issue.
- investigating Sep 09, 2025, 06:08 PM UTC
We are continuing to investigate this issue.
- investigating Sep 09, 2025, 06:32 PM UTC
We are continuing to investigate this issue.
- identified Sep 09, 2025, 07:04 PM UTC
The issue has been identified and a fix is being implemented.
- monitoring Sep 09, 2025, 07:13 PM UTC
A fix has been implemented and we are monitoring the results.
- monitoring Sep 09, 2025, 08:17 PM UTC
We are continuing to monitor for any further issues.
- monitoring Sep 09, 2025, 09:43 PM UTC
We are continuing to monitor for any further issues.
- monitoring Sep 09, 2025, 10:45 PM UTC
Applications are stable but we are still monitoring and checking for slowness.
- monitoring Sep 09, 2025, 11:43 PM UTC
We are continuing to monitor for any further issues.
- monitoring Sep 10, 2025, 12:44 AM UTC
We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
- monitoring Sep 10, 2025, 01:43 AM UTC
We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
- resolved Sep 10, 2025, 09:57 AM UTC
This incident has been resolved.
- postmortem Sep 10, 2025, 01:35 PM UTC
# Service Disruption on September 9–10, 2025 ## Summary On September 9, 2025, Northpass experienced a service disruption affecting all AWS-hosted customers. Users reported difficulties logging in, accessing courses, and completing activities. Azure-hosted customers were not affected. ## Impact * Duration: ~2 hours of major disruption, followed by slower performance for several more hours * Affected customers: AWS-hosted environments only * Symptoms: login errors, slow page loads, delayed certificates, and issues with integrations \(e.g., Zoom sessions\) ## Root Cause The disruption was caused by our primary database running out of available input/output capacity \(IOPS\) in AWS. This slowed down critical operations and caused delays across the platform. ## Resolution Our engineering team took immediate action to stabilize the system, including expanding capacity and reducing system load. Once traffic normalized, performance returned to expected levels. ## Next Steps \(Preventing Recurrence\) We are implementing the following permanent improvements: * Upgrading our AWS database storage to a higher-performance type with more IOPS capacity * Improving monitoring and alerting to detect database pressure earlier * Optimizing how we process background tasks to reduce load during peak usage * Optimizing database queries to reduce impact on performance and improve reliability