Hypercare incident

[CA region only] Unplanned interruption to services

Critical Resolved View vendor source →
Started
Mar 02, 2026, 04:25 PM UTC
Resolved
Mar 02, 2026, 06:35 PM UTC
Duration
2h 10m
Detected by Pingoru
Mar 02, 2026, 04:25 PM UTC

Affected components

Login and Single Sign On (Canadian Region)Messaging (Canadian Region)Notifications and Real-Time Syncing (Canadian Region)File Attachments (Canadian Region)Viewing Who is On-Call (Canadian Region)Code Teams (Canadian Region)Self-serve Scheduling (Canadian Region)Administration and Scheduling (Canadian Region)API & Integrations (Canadian Region)Virtual Pager (Canadian Region)

Update timeline

  1. investigating Mar 02, 2026, 04:25 PM UTC

    We are currently investigating an unplanned downtime of all core services.

  2. identified Mar 02, 2026, 04:46 PM UTC

    The issue has been identified and we are working on a fix.

  3. monitoring Mar 02, 2026, 05:23 PM UTC

    A fix has been released and all core services have been restored. We're continuing to monitor the incident.

  4. resolved Mar 02, 2026, 06:35 PM UTC

    This incident has been resolved. A post mortem will be shared shortly.

  5. postmortem Mar 03, 2026, 06:17 AM UTC

    **What Happened?** At approximately 11:15 am EST on Monday, March 2, 2026, core services in Canada experienced a system-wide downtime. The incident was caused by database connection exhaustion. The team suspects this was caused by a background process responsible for resetting user statuses \(transitioning Hypercare users from “Unavailable” or “Busy” statuses back to “Available” after a set expiry date and time\) generated a high volume of sustained, long-lived connections. The database has a fixed limit on permissible concurrent connections and these hanging connections saturated the pool, preventing any new requests from being processed. ‌ **Impact** All Hypercare services were inaccessible for Canadian users from approximately 11:15 am EST until 12:22 pm EST on Monday, March 2, 2026. ‌ **Resolution and Next Steps** The Engineering team restored services by manually terminating stalled connections and increased the capacity on the database pool. The team disabled the automated status reset feature to stabilize the environment and allow core services to resume normal operation. The automated status reset feature has been running intermittently in a controlled environment to ensure users status’ are reset appropriately.. ‌ To reduce the chances of a recurrence and improve our response time, the following actions are being taken: * Enhanced Monitoring: We are implementing additional early detection alerts for database connection utilization. This will allow us to intervene before the limit is reached. * Infrastructure Updates: We are increasing the number of permissible connections to the database to facilitate faster recovery during database restarts. * Rapid Recovery Protocol: While we finalize the permanent root cause fix, we have implemented a rapid recovery protocol, which will allow the team to instantly clear hanging connections, reducing potential recovery time from several minutes to under 60 seconds. ‌ We apologize for the disruption caused by today’s unplanned downtime. We thank everyone for their patience and continued support.

Looking to track Hypercare downtime and outages?

Pingoru polls Hypercare's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Hypercare reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Hypercare alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Hypercare for free

5 free monitors · No credit card required