DRACOON incident

Partial Outage of Storage Upload

DRACOON experienced a major incident on August 22, 2024 affecting Upload, lasting 3h 19m. The incident has been resolved; the full update timeline is below.

Started: Aug 22, 2024, 10:58 AM UTC
Resolved: Aug 22, 2024, 02:18 PM UTC
Duration: 3h 19m
Detected by Pingoru: Aug 22, 2024, 10:58 AM UTC

Affected components

Upload

Update timeline

investigating Aug 22, 2024, 10:58 AM UTC

We are currently investigating an issue with our storage. Our team is working to gather more information and resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and will provide updates as soon as we have them.
monitoring Aug 22, 2024, 11:18 AM UTC

The issue with storage has been resolved, and we are monitoring the situation to ensure it remains stable. We apologize for any inconvenience this may have caused and appreciate your patience.
resolved Aug 22, 2024, 02:18 PM UTC

The issue with storage has been fully resolved. All systems are now operating normally. We apologize for any inconvenience this may have caused and appreciate your patience. If you continue to experience any issues, please don't hesitate to reach out to our support team for assistance.
postmortem Sep 19, 2024, 09:22 AM UTC

We experienced an issue with the DRACOON Cloud on 22.08.2024 from 12:30 - 15:00. Our team has worked diligently to identify the root cause and implement a resolution. In this post-mortem, we want to share the details of what happened, why it happened, what we did to resolve it, and what we will do to prevent similar incidents in the future. What happened? There has been a partial outage of our Upload Functionality during the incident window. Only a part of our customers was affected. Why did this happen? There happened to be some issues on one of our backend services, which resulted in failing to complete uploads for some of our customers. What did we do? Our incident response team quickly identified the underlying issue and resolved it by restarting and increasing the amount of instances of the affected service. What can we do to improve? We will further investigate the issue and improve the affected service and its monitoring to prevent similar issues in the future. We apologize for any inconvenience this incident may have caused. We are committed to ensuring the stability and reliability of our services and will continue to take proactive measures to prevent similar incidents from happening in the future. If you have any questions or concerns, please don't hesitate to reach out to our support team for assistance.