Northpass incident

Elevated API Errors - user access login error

Northpass experienced a major incident on September 9, 2025 affecting Northpass App - AWS, lasting 9h 18m. The incident has been resolved; the full update timeline is below.

Started: Sep 09, 2025, 05:42 PM UTC
Resolved: Sep 10, 2025, 03:00 AM UTC
Duration: 9h 18m
Detected by Pingoru: Sep 09, 2025, 05:42 PM UTC

Affected components

Northpass App - AWS

Update timeline

investigating Sep 09, 2025, 05:42 PM UTC

We're experiencing an elevated level of API errors and are currently looking into the issue.
investigating Sep 09, 2025, 05:48 PM UTC

We are continuing to investigate this issue. Admins and end users are experiencing errors when logging in
investigating Sep 09, 2025, 05:52 PM UTC

We are continuing to investigate this issue.
investigating Sep 09, 2025, 06:08 PM UTC

We are continuing to investigate this issue.
investigating Sep 09, 2025, 06:32 PM UTC

We are continuing to investigate this issue.
identified Sep 09, 2025, 07:04 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Sep 09, 2025, 07:13 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Sep 09, 2025, 08:17 PM UTC

We are continuing to monitor for any further issues.
monitoring Sep 09, 2025, 09:43 PM UTC

We are continuing to monitor for any further issues.
monitoring Sep 09, 2025, 10:45 PM UTC

Applications are stable but we are still monitoring and checking for slowness.
monitoring Sep 09, 2025, 11:43 PM UTC

We are continuing to monitor for any further issues.
monitoring Sep 10, 2025, 12:44 AM UTC

We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
monitoring Sep 10, 2025, 01:43 AM UTC

We are continuing to monitor for issues. Customers and Learners may experience slower page loads as services recover.
resolved Sep 10, 2025, 09:57 AM UTC

This incident has been resolved.
postmortem Sep 10, 2025, 01:35 PM UTC

# Service Disruption on September 9–10, 2025 ## Summary On September 9, 2025, Northpass experienced a service disruption affecting all AWS-hosted customers. Users reported difficulties logging in, accessing courses, and completing activities. Azure-hosted customers were not affected. ## Impact * Duration: ~2 hours of major disruption, followed by slower performance for several more hours * Affected customers: AWS-hosted environments only * Symptoms: login errors, slow page loads, delayed certificates, and issues with integrations \(e.g., Zoom sessions\) ## Root Cause The disruption was caused by our primary database running out of available input/output capacity \(IOPS\) in AWS. This slowed down critical operations and caused delays across the platform. ## Resolution Our engineering team took immediate action to stabilize the system, including expanding capacity and reducing system load. Once traffic normalized, performance returned to expected levels. ## Next Steps \(Preventing Recurrence\) We are implementing the following permanent improvements: * Upgrading our AWS database storage to a higher-performance type with more IOPS capacity * Improving monitoring and alerting to detect database pressure earlier * Optimizing how we process background tasks to reduce load during peak usage * Optimizing database queries to reduce impact on performance and improve reliability