DRACOON incident

Partial Outage of DRACOON

DRACOON experienced a major incident on July 7, 2025 affecting API (group 01) and API (group 02) and 1 more component, lasting 1h 5m. The incident has been resolved; the full update timeline is below.

Started: Jul 07, 2025, 10:05 AM UTC
Resolved: Jul 07, 2025, 11:10 AM UTC
Duration: 1h 5m
Detected by Pingoru: Jul 07, 2025, 10:05 AM UTC

Affected components

API (group 01)API (group 02)AuthenticationAPI (group 03)Web AppAPI (group 04)BrandingAPI (group 05)WebDAVAPI (group 06)

Update timeline

investigating Jul 07, 2025, 10:05 AM UTC

We are currently investigating an issue with DRACOON. Our team is working to gather more information and resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and will provide updates as soon as we have them.
monitoring Jul 07, 2025, 10:59 AM UTC

The issue with DRACOON has been resolved, and we are monitoring the situation to ensure it remains stable. We apologize for any inconvenience this may have caused and appreciate your patience.
resolved Jul 07, 2025, 11:10 AM UTC

The issue with DRACOON has been fully resolved. All systems are now operating normally. We apologize for any inconvenience this may have caused and appreciate your patience. If you continue to experience any issues, please don't hesitate to reach out to our support team for assistance.
postmortem Sep 09, 2025, 03:01 PM UTC

We experienced an issue with DRACOON Cloud on 2025-07-07 from around 12:00 to 13:00. Our team has worked diligently to identify the root cause and implement a resolution. In this post-mortem, we want to share the details of what happened, why it happened, what we did to resolve it, and what we will do to prevent similar incidents in the future. What happened? DRACOON Cloud experienced performance degradation. Some users reported slow response times and connection timeouts during the incident window. Why did this happen? The incident was caused by increased load and scaling issues. A part of the system was unable to handle the traffic volume, leading to resource exhaustion and subsequent service degradation. What did we do? Our team immediately identified the scaling bottleneck and implemented emergency load balancing measures. We scaled up the infrastructure resources to restore normal service levels by 13:00. What can we do to improve? We will implement automated scaling policies to handle traffic spikes, enhance monitoring and alerting systems for early detection of load issues, and conduct regular capacity planning reviews to prevent similar incidents. We apologize for any inconvenience this incident may have caused. We are committed to ensuring the stability and reliability of our services and will continue to take proactive measures to prevent similar incidents from happening in the future. If you have any questions or concerns, please don't hesitate to reach out to our support team for assistance.