Templafy incident

Service degradation: Slow Access on West Europe (Production 1)

Templafy experienced a major incident on May 28, 2024 affecting Library & Dynamics, lasting 1h 52m. The incident has been resolved; the full update timeline is below.

Started: May 28, 2024, 08:45 AM UTC
Resolved: May 28, 2024, 10:37 AM UTC
Duration: 1h 52m
Detected by Pingoru: May 28, 2024, 08:45 AM UTC

Affected components

Library & Dynamics

Update timeline

identified May 28, 2024, 08:52 AM UTC

We have identified an issue that affects a subset of customers and are working towards a resolution. Further updates will be posted here soon.
monitoring May 28, 2024, 09:10 AM UTC

The incident has been successfully mitigated, and our team is actively monitoring the situation to ensure ongoing stability and performance. We are observing the systems to prevent any further disruptions.
resolved May 28, 2024, 10:37 AM UTC

The incident has been resolved, and further information will be provided in a postmortem shortly. We apologize for the impact to affected customers.
postmortem May 29, 2024, 12:34 PM UTC

On May 28, 2024, at 10:45 AM CET, an incident impacting all users utilizing the Dynamics & Library system within the West Europe \(Production 1\) environment was detected. The issue caused the system to have degraded performance, causing the users to experience slow responses or even timeouts. The engineering team quickly discovered that the degraded performance was caused by the SQL server being under a heavy load due to a reindexing operation. The reindexing operation was part of a migration process that the engineering team was rolling out. At 11:00 CET, as an immediate mitigation, the engineering team initiated the capacity increase of the SQL server. By 11:05 CET, the extra resources to the SQL server were successfully allocated. At this time, the application performance restored to normal parameters, and the application users were no longer impacted. By 12:37 CET, the incident was resolved after the engineering team successfully applied the migration and confirmed it was working as expected. We are reviewing and enhancing our internal procedures for migrations to ensure that similar issues are prevented in the future.