Templafy incident

Users encounter issues when accessing West Europe (Production 1)

Templafy experienced a major incident on February 9, 2026 affecting Authentication and Library & Dynamics and 1 more component, lasting 7h 57m. The incident has been resolved; the full update timeline is below.

Started: Feb 09, 2026, 07:30 AM UTC
Resolved: Feb 09, 2026, 03:28 PM UTC
Duration: 7h 57m
Detected by Pingoru: Feb 09, 2026, 07:30 AM UTC

Affected components

AuthenticationLibrary & DynamicsUser Management

Update timeline

identified Feb 09, 2026, 09:11 AM UTC

We have identified an issue that affects a subset of customers and are working towards a resolution. Further updates will be posted here soon.
monitoring Feb 09, 2026, 09:14 AM UTC

The incident has been successfully mitigated, and our team is actively monitoring the situation to ensure ongoing stability and performance. We are observing the systems to prevent any further disruptions.
resolved Feb 09, 2026, 03:28 PM UTC

The incident has been resolved, and further information will be provided in a postmortem shortly. We apologize for the impact to affected customers.
postmortem Feb 10, 2026, 03:14 PM UTC

# Investigation On February 9, 2026, at 8:19 AM CET, an issue began affecting the database. The incident was detected shortly after at 8:22 AM CET, when automated monitoring raised an alert indicating that database CPU utilization had reached 100%. As CPU utilization increased, the database reached its maximum connection limit, which caused dependent services to become slow or unresponsive for users in the affected environment. The engineering team initiated an investigation immediately after detection and analyzed database performance metrics and active workloads. By 9:05 AM CET, the team identified that the database stopped accepting new connections from services. The investigation revealed that a recently introduced query change was contributing to excessive CPU consumption under certain data conditions. # Mitigation and Resolution To stabilize the platform and restore service responsiveness, the engineering team implemented an immediate mitigation at 9:15 AM CET by scaling up the database server. This action increased available compute capacity and reset active database connections, allowing blocked operations to complete and preventing further connection exhaustion. The mitigation reduced CPU pressure and enabled services to gradually recover while further validation was performed. Shortly after, the database had fully stabilized, connection limits were no longer being reached, and all dependent services were operating normally. Continuous monitoring confirmed that CPU utilization had returned to expected levels and that user-facing functionality was fully restored. # Impact and Scope The incident affected most users served by West Europe \(Production 1\). During the incident window, users may have experienced slow responses or temporary unavailability in services relying on the database. No other production clusters were impacted. # Post-Incident Actions Following the incident, the engineering team defined several follow-up actions to reduce the risk of similar issues occurring in the future: • Expanding test datasets to better reflect production-scale data and diverse data distributions when validating performance-sensitive changes • Increasing internal awareness and knowledge sharing around database query optimization behavior and execution plan variability • Investigating broader technical solutions and tooling to help detect and mitigate database execution plan selection issues before deployment These actions are aimed at strengthening our validation processes and improving overall platform resilience.