IONOS Cloud incident

IAM and DCD Availability Issues

IONOS Cloud experienced a major incident on January 27, 2026 affecting Data Center Designer (DCD) and Network and 1 more component, lasting 2h 19m. The incident has been resolved; the full update timeline is below.

Started: Jan 27, 2026, 04:35 PM UTC
Resolved: Jan 27, 2026, 06:55 PM UTC
Duration: 2h 19m
Detected by Pingoru: Jan 27, 2026, 04:35 PM UTC

Affected components

Data Center Designer (DCD)NetworkProvisioning

Update timeline

investigating Jan 27, 2026, 04:35 PM UTC

We are currently investigating alerts related to DCD and IAM service availability in TXL DC
investigating Jan 27, 2026, 04:37 PM UTC

We are continuing to investigate this issue.
identified Jan 27, 2026, 05:08 PM UTC

We have identified a connectivity issue preventing our identity service to reach an upstream database. Our Operations Team is currently working to mitigate the connectivity issue. We will be updating this page regularly to keep you informed about our progress.
identified Jan 27, 2026, 05:57 PM UTC

We have traced the issue to an internal object storage system. Our infrastructure team is currently working on recovery. At this time, users may continue to experience DCD login issues, IAM service disruptions, and delays or errors related to provisioning. We see no evidence of further user-facing network issues and are updating the status accordingly.
identified Jan 27, 2026, 06:24 PM UTC

We have recovered the affected Object Storage and are currently monitoring the recovery of affected systems and services.
monitoring Jan 27, 2026, 06:24 PM UTC

We have recovered the affected Object Storage and are currently monitoring the recovery of affected systems and services.
resolved Jan 27, 2026, 06:55 PM UTC

We are marking this incident as resolved. We will compile a Root Cause Analysis and will publish it as an update to this incident.
postmortem Feb 02, 2026, 04:05 PM UTC

We have finished our research into this incident and want so share the following Root Cause Analysis: **What happened?** On January 27, 2026, a storage incident occurred in our Berlin region, leading to I/O performance degradation. This impacted the availability of management and customer-facing services. **How did this happen? \(Technical Root Cause\)** The incident was triggered during a scheduled maintenance window involving updates to parts of our supporting infrastructure.. * _Initialization Failure_: Following a node reboot, several storage daemons failed to reconnect due to a configuration mismatch stemming from a previous hardware replacement. * _Operational Error_: During the troubleshooting process, a manual command was executed to clear stale storage entries. This command inadvertently removed active storage components from other nodes while the cluster was already in a vulnerable, degraded state. * _Performance Impact_: The loss of these additional components resulted in the temporary unavailability of specific data segments. This caused a "hang" in I/O operations for some management services relying on that storage, leading to timeouts and service interruptions. Unexpectedly, two failover instances did not behave nominally, which led to the short-term impact on the IAM \(auth management\) system. Full functionality was restored once engineers recovered the missing storage components and initiated a cluster-wide data rebalancing. At no point systems that are handling or storing customer data were directly involved in this incident, however, our IAM services were affected by the performance impact of the incident, leading to customer facing service outage for the DCD login. **What are we doing to prevent this from happening again?** To increase the resilience of our internal supporting infrastructure and minimize the impact of human error, we are reviewing and strengthening the following measures: * _Refinement of Hardware Procedures_: We are updating our disk replacement and decommissioning workflows to include more robust verification steps, ensuring hardware identifiers remain consistent through system reboots. * _Guardrails for Management Commands_: We are implementing software safeguards and updated standard operating procedures \(SOPs\) to restrict high-impact cluster commands while a storage environment is already showing signs of degradation. * _Redundancy Improvements:_ We are conducting a review of services that experienced failover issues to ensure that a local disruption does not impact global service availability.