Hypercare incident

[CA region only] Unplanned interruption to services

Hypercare experienced a critical incident on March 2, 2026 affecting Login and Single Sign On (Canadian Region) and Messaging (Canadian Region) and 1 more component, lasting 2h 10m. The incident has been resolved; the full update timeline is below.

Started: Mar 02, 2026, 04:25 PM UTC
Resolved: Mar 02, 2026, 06:35 PM UTC
Duration: 2h 10m
Detected by Pingoru: Mar 02, 2026, 04:25 PM UTC

Affected components

Login and Single Sign On (Canadian Region)Messaging (Canadian Region)Notifications and Real-Time Syncing (Canadian Region)File Attachments (Canadian Region)Viewing Who is On-Call (Canadian Region)Code Teams (Canadian Region)Self-serve Scheduling (Canadian Region)Administration and Scheduling (Canadian Region)API & Integrations (Canadian Region)Virtual Pager (Canadian Region)

Update timeline

investigating Mar 02, 2026, 04:25 PM UTC

We are currently investigating an unplanned downtime of all core services.
identified Mar 02, 2026, 04:46 PM UTC

The issue has been identified and we are working on a fix.
monitoring Mar 02, 2026, 05:23 PM UTC

A fix has been released and all core services have been restored. We're continuing to monitor the incident.
resolved Mar 02, 2026, 06:35 PM UTC

This incident has been resolved. A post mortem will be shared shortly.
postmortem Mar 03, 2026, 06:17 AM UTC

**What Happened?** At approximately 11:15 am EST on Monday, March 2, 2026, core services in Canada experienced a system-wide downtime. The incident was caused by database connection exhaustion. The team suspects this was caused by a background process responsible for resetting user statuses \(transitioning Hypercare users from “Unavailable” or “Busy” statuses back to “Available” after a set expiry date and time\) generated a high volume of sustained, long-lived connections. The database has a fixed limit on permissible concurrent connections and these hanging connections saturated the pool, preventing any new requests from being processed. ‌ **Impact** All Hypercare services were inaccessible for Canadian users from approximately 11:15 am EST until 12:22 pm EST on Monday, March 2, 2026. ‌ **Resolution and Next Steps** The Engineering team restored services by manually terminating stalled connections and increased the capacity on the database pool. The team disabled the automated status reset feature to stabilize the environment and allow core services to resume normal operation. The automated status reset feature has been running intermittently in a controlled environment to ensure users status’ are reset appropriately.. ‌ To reduce the chances of a recurrence and improve our response time, the following actions are being taken: * Enhanced Monitoring: We are implementing additional early detection alerts for database connection utilization. This will allow us to intervene before the limit is reached. * Infrastructure Updates: We are increasing the number of permissible connections to the database to facilitate faster recovery during database restarts. * Rapid Recovery Protocol: While we finalize the permanent root cause fix, we have implemented a rapid recovery protocol, which will allow the team to instantly clear hanging connections, reducing potential recovery time from several minutes to under 60 seconds. ‌ We apologize for the disruption caused by today’s unplanned downtime. We thank everyone for their patience and continued support.