DRACOON incident

Degraded Performance of DRACOON Cloud

DRACOON experienced a critical incident on February 20, 2024 affecting API (group 01) and API (group 02) and 1 more component, lasting 2h 42m. The incident has been resolved; the full update timeline is below.

Started: Feb 20, 2024, 02:13 PM UTC
Resolved: Feb 20, 2024, 04:56 PM UTC
Duration: 2h 42m
Detected by Pingoru: Feb 20, 2024, 02:13 PM UTC

Affected components

API (group 01)API (group 02)API (group 03)API (group 04)API (group 05)API (group 06)API (group 07)API (group 08)API (group 09)

Update timeline

investigating Feb 20, 2024, 02:13 PM UTC

We are currently investigating an issue with our DRACOON Cloud. Our team is working to gather more information and resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and will provide updates as soon as we have them.
investigating Feb 20, 2024, 02:16 PM UTC

We are continuing to investigate this issue.
monitoring Feb 20, 2024, 02:25 PM UTC

The issue with our DRACOON CLoud has been resolved, and we are monitoring the situation to ensure it remains stable. We apologize for any inconvenience this may have caused and appreciate your patience.
resolved Feb 20, 2024, 04:56 PM UTC

The issue with the DRACOON Cloud has been fully resolved. All systems are now operating normally. We apologize for any inconvenience this may have caused and appreciate your patience. If you continue to experience any issues, please don't hesitate to reach out to our support team for assistance.
postmortem Aug 30, 2024, 02:32 PM UTC

**20.02.2024 - 15:13 CET - Post-Mortem** We experienced an issue with our DRACOON Cloud service on 20th of February 2024 at 15:13 CET. Our team has worked diligently to identify the root cause and implement a resolution. In this post-mortem, we want to share the details of what happened, why it happened, what we did to resolve it, and what we will do to prevent similar incidents in the future. **What happened?** The performance of our DRACOON Cloud was degraded for some customers due to an extraordinary load situation on one component. **Why did this happen?** Due to a high-load situation the database performance was insufficient to cover the actual load for some specific requests/customers. Our auto-healing configuration therefore restarted the affected/stuck transactions which cleared the situation \(the transaction backlog\) within a couple of minutes. **What did we do?** An automated process restarted some stuck teansactions, so they could be processed as planned. No further/manual interactions were required. **What can we do to improve?** We are steadily improving the performance of our DRACOON Cloud infrastructure and the code of our application to be able to sustain such high load scenarios. So there is no specific action following this incident outside of our regular scope. We apologize for any inconvenience this incident may have caused. We are committed to ensuring the stability and reliability of our services and will continue to take proactive measures to prevent similar incidents from happening in the future. If you have any questions or concerns, please don't hesitate to reach out to our support team for assistance.