IONOS Cloud Outage History

IONOS Cloud is up right now

There were 23 IONOS Cloud outages since February 3, 2026 totaling 368h 38m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.ionos.cloud

Minor April 28, 2026

Network: Increased latency in TXL

Detected by Pingoru
Apr 28, 2026, 08:17 PM UTC
Resolved
Apr 29, 2026, 04:32 PM UTC
Duration
20h 15m
Affected: NetworkAI Model Hub
Timeline · 4 updates
  1. investigating Apr 28, 2026, 08:17 PM UTC

    We are currently investigating network latency irregularities in our TXL datacenter.

  2. investigating Apr 28, 2026, 08:27 PM UTC

    Our network team is currently investigating the issue. We are seeing increased error rates on the AI Modelhub and including the service as potentially affected.

  3. monitoring Apr 28, 2026, 09:05 PM UTC

    Our Network Team has identified the likely cause of the performance degradation and has deployed a mitigation. We are currently monitoring bandwidth and latency metrics to ensure stability.

  4. resolved Apr 29, 2026, 04:32 PM UTC

    As no further anomalies have been spotted throughout peak times, we are closing this incident. A Root Cause Analysis will be provided here, once compiled

Read the full incident report →

Minor April 22, 2026

Support Telephone Routing Issue

Detected by Pingoru
Apr 22, 2026, 10:17 AM UTC
Resolved
Apr 22, 2026, 03:22 PM UTC
Duration
5h 4m
Affected: Cloud Support
Timeline · 2 updates
  1. investigating Apr 22, 2026, 10:17 AM UTC

    We are currently investigating a routing issue affecting our Cloud Support telephone system. Customers might experience disconnects or calls not being routed correctly. If you are unable to reach IONOS Cloud Support via the numbers listed in the DCD or our documentation, please use the following validated line: Direct Dial: +49 30 577 00 820

  2. resolved Apr 22, 2026, 03:22 PM UTC

    We are resolving this incident as the underlying cause has been identified and fixed. We are sorry for the inconvenience caused. All our documented support numbers should now work again as expected. https://docs.ionos.com/cloud/support/general-information/contact-information

Read the full incident report →

Minor April 21, 2026

S3: Increased error count in eu-central-1

Detected by Pingoru
Apr 21, 2026, 07:37 AM UTC
Resolved
Apr 21, 2026, 05:02 PM UTC
Duration
9h 25m
Affected: Object Storage
Timeline · 5 updates
  1. investigating Apr 21, 2026, 07:37 AM UTC

    We are currently investigating increased error rates for S3 buckets located in eu-central-1 Affected services: Object Storage Location: Location DE/FRA

  2. investigating Apr 21, 2026, 07:38 AM UTC

    We are continuing to investigate this issue.

  3. identified Apr 21, 2026, 10:47 AM UTC

    Our engineering team has identified a localized resource contention issue on several storage nodes, which is resulting in increased latency for some users. Maintenance is underway across the affected storage cluster to restore optimal performance levels.

  4. monitoring Apr 21, 2026, 02:04 PM UTC

    We are currently seeing error rates dropping and are monitoring the situation closely.

  5. resolved Apr 21, 2026, 05:02 PM UTC

    As error rates have dropped and no more anomalies are being observed, we are marking this incident as resolved. Our Object Storage team is preparing a Root Cause Analysis that will be published here once it is finalized

Read the full incident report →

Critical April 20, 2026

Provisioning Maintenance

Detected by Pingoru
Apr 20, 2026, 09:01 AM UTC
Resolved
Apr 22, 2026, 06:31 PM UTC
Duration
2d 9h
Timeline · 2 updates
  1. scheduled Apr 20, 2026, 09:01 AM UTC

    We are conducting maintenance on our provisioning engine. During the maintenance customers might see delays in provisioning. In rare cases provisioning errors can be experienced upon which requests need to be repeated. There will be no impact on existing, already provisioned resources. Affected services: Provisioning Location: Location DE/FKB, Location DE/FRA, Location DE/FRA/2, Location DE/TXL, Location ES/VIT, Location FR/PAR, Location GB/BHX, Location GB/GLO, Location GB/LHR, Location US/EWR, Location US/LAS, Location US/MCI

  2. in progress Apr 22, 2026, 04:30 PM UTC

    Scheduled maintenance is currently in progress. We will provide updates as necessary.

Read the full incident report →

Major April 14, 2026

Network Connectivity Issue in FRA

Detected by Pingoru
Apr 14, 2026, 09:40 PM UTC
Resolved
Apr 14, 2026, 10:29 PM UTC
Duration
49m
Affected: Network
Timeline · 3 updates
  1. investigating Apr 14, 2026, 09:40 PM UTC

    We have identified a networking issue affecting one specific host in FRA. Other hosts in this region remain unaffected.

  2. monitoring Apr 14, 2026, 10:16 PM UTC

    Our Network Team has identified a potential issue related to a previous package update. They have deployed a mitigation. We are currently monitoring the host for further anomalies. Network connectivity on the affected host should be restored.

  3. resolved Apr 14, 2026, 10:29 PM UTC

    We are marking this incident as resolved. The root cause was inconsistencies in a database that was modified during a prior package update. A Root Cause analysis will be provided here once compiled.

Read the full incident report →

Minor April 9, 2026

AI Modelhub: Performance Degradation

Detected by Pingoru
Apr 09, 2026, 06:28 AM UTC
Resolved
Apr 09, 2026, 03:46 PM UTC
Duration
9h 17m
Affected: AI Model Hub
Timeline · 3 updates
  1. investigating Apr 09, 2026, 06:28 AM UTC

    We are experiencing increased traffic volumes for specific models, including GPT-OSS 120B, which is currently causing capacity constraints. Users may encounter longer response times or intermittent timeouts. Our AI Modelhub Team is actively working to scale capacity and resolve these issues

  2. identified Apr 09, 2026, 10:06 AM UTC

    Our AI Model hub Team has identified a likely culprit for the increased load. The team is working towards increasing capacity to ensure that GPT-OSS 120B stays available for all customers. Customers may still experience intermittent timeouts.

  3. resolved Apr 09, 2026, 03:46 PM UTC

    Response times of the model have improved significantly. We are marking the incident as resolved.

Read the full incident report →

Minor April 8, 2026

Performance Degradation Compute FRA

Detected by Pingoru
Apr 08, 2026, 06:48 AM UTC
Resolved
Apr 08, 2026, 07:50 PM UTC
Duration
13h 2m
Affected: ComputeManaged Kubernetes
Timeline · 9 updates
  1. investigating Apr 08, 2026, 06:48 AM UTC

    We are currently investigating performance degradation affecting compute components in our FRA DC. This issue is impacting a subset of Virtual Machines (VMs) and Kubernetes Clusters. We will provide further updates as our investigation progresses.

  2. identified Apr 08, 2026, 08:27 AM UTC

    We have identified an increase in CPU steal time on affected hosts. Our Compute team has identified a likely culprit and is currently testing a potential mitigation to ensure its effectiveness before a rollout.

  3. identified Apr 08, 2026, 09:54 AM UTC

    Our compute team has found another factor negatively impacting CPU performance for affected VMs. We are currently testing a potential transparent resolution for the problematic CPU affinity setting.

  4. identified Apr 08, 2026, 10:12 AM UTC

    Our compute team has successfully tested the proposed fix for the CPU core affinity and is preparing a rollout. We will monitor the results.

  5. monitoring Apr 08, 2026, 11:28 AM UTC

    The adjustment was rolled out. Our Compute team is seeing dropping CPU steal time. We are monitoring the situation. Our Tech Teams are preparing another rollout that should improve the performance further.

  6. monitoring Apr 08, 2026, 03:29 PM UTC

    The second configuration update rollout is currently in progress, and we have confirmed initial improvements related to CPU performance. Due to the size of the fleet, we expect the rollout to take some time to complete. Throughout the process, customers will see performance gains as soon as the specific hosts supporting their workloads have been updated. We will provide a final update once the rollout is finished.

  7. monitoring Apr 08, 2026, 05:58 PM UTC

    Our Compute Team has confirmed that the fix has been rolled out to the majority of affected hosts. We are currently finishing the rollout and will provide an update once the remaining hosts on affected clusters are covered

  8. resolved Apr 08, 2026, 07:50 PM UTC

    We have successfully completed the rollout to all remaining hosts and are closing this incident. A Root Cause Analysis is currently being conducted by the Compute Team and will be shared here upon completion.

  9. postmortem Apr 21, 2026, 06:04 PM UTC

    **What happened** Virtual machines with dedicated CPU allocations in the Frankfurt data center began exhibiting abnormally high CPU steal time indicating that the hypervisor was unable to provide the requested CPU resources. This performance degradation occurred on multiple customer instances and persisted even after guest operating system reboots and configuration changes. This lead to performance degradation in customer workloads. **How was that possible? \(Root cause\)** The issue was caused by a regression introduced during recent improvements to the virtual machine checkpointing mechanism. A code change intended to optimize the checkpoint/restore process inadvertently affected the live migration code path, which is used when VMs are moved between physical servers. The virtualization hypervisor uses a process called CPU pinning to ensure that guest VM threads run on their dedicated CPU cores as allocated. When this pinning is not properly configured, the VM's processes fall back to inheriting the CPU assignment of the parent process. In this case, the parent process's assignment included core 0 - a core normally reserved exclusively for the host operating system that should not be allocated to guest workloads. The result was a failure of the resource allocation guarantees: * VMs with dedicated vCPU allocations could not be pinned to their assigned cores * The VM's virtual CPU threads instead competed for time on an oversubscribed, host-reserved core * The hypervisor scheduler could not guarantee the VM's promised CPU time High CPU steal time resulted despite adequate physical CPU resources being available. This issue affected VMs in the Frankfurt infrastructure that underwent live migration operations during the deployment window when the problematic code was active. **How we prevent recurrence** _Enhanced CPU Pinning Validation_: The virtualization infrastructure codebase has been updated to restore proper CPU pinning for all live migration operations.. \(DONE\) _Strengthened Pre-deployment Testing_: Enhance the validation procedures for virtualization infrastructure changes to catch CPU allocation anomalies before code is deployed to production. \(DONE\) _Automated Abnormal Steal Time Alertin_g: Implement automated monitoring and alerting to detect abnormal CPU steal time on VMs with dedicated vCPU allocations. This will enable faster detection of similar configuration regressions in the future. \(Within Q2 2026\) _Enhancing post rollout monitoring_: Extend the post-deployment monitoring and assessment window to increase likelihood of anomalies being spotted and correctly correlated to a change. \(DONE\) **Closing remark** The incident resulted in measurable performance degradation for customers over an extended period. The fact that the increase in steal time initially went unnoticed highlighted a gap in our alerting and monitoring setup. The ambiguity of the symptoms - characterized by general performance issues in some guests -, and apparent lack of “common denominators” in affected systems meant that initial incident reports and existing indicators were not properly understood, correlated. and attributed in a timely fashion. The corrective actions outlined above address both the immediate defect and the systemic factors that allowed it to surface. These measures are designed to prevent recurrence and significantly reduce the time to detection and resolution in the future. We thank our affected customers and partners for their patience and constructive collaboration throughout this incident.

Read the full incident report →

Minor March 31, 2026

Managed Kubernetes: Service Degradation for Control Planes in FRA

Detected by Pingoru
Mar 31, 2026, 12:22 PM UTC
Resolved
Apr 01, 2026, 08:18 PM UTC
Duration
1d 7h
Affected: Managed Kubernetes
Timeline · 3 updates
  1. investigating Mar 31, 2026, 12:22 PM UTC

    We are currently investigating service degradation for Control Planes of some Kubernetes clusters in the FRA DC.

  2. monitoring Mar 31, 2026, 01:10 PM UTC

    All affected Control Planes have recovered. We are currently monitoring the affected clusters for further anomalies.

  3. resolved Apr 01, 2026, 08:18 PM UTC

    We are marking the incident as resolved, as no further anomalies were detected. The root cause was a transient latency issue in one of the redundant CoreDNS pods, which led to kube-apiservers being unable to discover etcd instances in time. We are currently developing better mitigation options for the excessive NXDOMAIN requests observed during the incident.

Read the full incident report →

Minor March 25, 2026

Availability of API and DCD limited

Detected by Pingoru
Mar 25, 2026, 12:43 AM UTC
Resolved
Mar 25, 2026, 11:10 AM UTC
Duration
10h 27m
Affected: Data Center Designer (DCD)Cloud APIBilling APIReseller API
Timeline · 3 updates
  1. investigating Mar 25, 2026, 12:43 AM UTC

    We are currently investigating limitations in the availability of the Cloud-API and the data center designer (DCD). We will provide further information as soon as possible.

  2. monitoring Mar 25, 2026, 03:00 AM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Mar 25, 2026, 11:10 AM UTC

    This incident is resolved. We will publish a root cause analysis once it is compiled.

Read the full incident report →

Minor March 16, 2026

AI Model Hub - Increased Error Rate in Embeddings

Detected by Pingoru
Mar 16, 2026, 08:28 AM UTC
Resolved
Mar 16, 2026, 11:08 AM UTC
Duration
2h 40m
Affected: AI Model Hub
Timeline · 3 updates
  1. investigating Mar 16, 2026, 08:28 AM UTC

    Our AI Model Hub Team is currently investigating increased error rates in the Embeddings functionality in the AI Model Hub.

  2. identified Mar 16, 2026, 10:01 AM UTC

    The team has traced the root cause to the ongoing Kubernetes incident (link). Both teams are currently working to restore service. (https://status.ionos.cloud/incidents/h9x5s66m4r28)

  3. resolved Mar 16, 2026, 11:08 AM UTC

    To improve visibility and streamline communication, we are merging this incident into the MK8s incident (tracked here: https://status.ionos.cloud/incidents/h9x5s66m4r28). Consequently, we will close this specific entry and provide all future updates via the referenced incident link.

Read the full incident report →

Major March 16, 2026

MK8s: Partial Connectivity Degradation to Control Planes

Detected by Pingoru
Mar 16, 2026, 08:24 AM UTC
Resolved
Mar 17, 2026, 05:04 PM UTC
Duration
1d 8h
Affected: Database as a Service (DBaaS)Managed KubernetesAI Model HubContainer Registry
Timeline · 13 updates
  1. investigating Mar 16, 2026, 08:24 AM UTC

    Some customers may experience connection problems to the control plane and degraded functionality of kubernetes. Our teams are investigating and working on a resolution.

  2. identified Mar 16, 2026, 10:04 AM UTC

    The team has identified the root cause as a resource constraint within the etcd database. Mitigation efforts are currently underway.

  3. identified Mar 16, 2026, 11:05 AM UTC

    We are expanding the scope of this incident to include DBaaS and AI Model Hub. We have observed an increased error count originating from PostgresDB on Kubernetes. Additionally, to improve transparency, the previously reported separate incident regarding the AI Model Hub (https://status.ionos.cloud/incidents/rmgs845klm32) is being merged into this primary incident.

  4. identified Mar 16, 2026, 11:37 AM UTC

    Our Kubernetes Team has deployed a fix for the affected AI Model Hub Database Services. We currently see metrics improving and monitoring the situation closely.

  5. identified Mar 16, 2026, 11:57 AM UTC

    We are adding the Container Registry as an affected Service. Customers may currently experience issues pulling and pushing images from the Registry.

  6. identified Mar 16, 2026, 12:22 PM UTC

    We are closing the incident for the AI Model Hub. All metrics have recovered and the service should be up and running again normally.

  7. identified Mar 16, 2026, 12:57 PM UTC

    We are marking the Container Registry Service as recovered.

  8. identified Mar 16, 2026, 01:56 PM UTC

    We are marking DBaaS as recovered. Our Kubernetes Team is currently working on stabilizing the Kubernetes Control Plane. We are focusing on mitigating recurring load spikes influencing stability.

  9. identified Mar 16, 2026, 07:34 PM UTC

    The changes applied to some control plane clusters have had positive effects. The team is continuing the rollout to other affected clusters.

  10. identified Mar 16, 2026, 07:58 PM UTC

    Final changes have been applied to all afffected clusters resolving the issue. We are now monitoring the progress

  11. monitoring Mar 16, 2026, 07:59 PM UTC

    Final changes have been applied to all afffected clusters resolving the issue. We are now monitoring the progress

  12. resolved Mar 17, 2026, 05:04 PM UTC

    This incident is now marked resolved, as all affected control planes have returned to and stayed in a stable state. A Root Cause Analysis (RCA) is underway and will be published here once finalized.

  13. postmortem Apr 23, 2026, 01:57 PM UTC

    **What Happened?** On March 14, 2026 a subset of customer Managed Kubernetes control planes experienced time periods of intermittent, recurring unavailability. The incident persisted until March 16, 2026, when the last anomalies were recorded and the incident was fully mitigated. During the impact window, affected clusters were periodically unreachable, meaning operations depending on the Kubernetes API - such as deployments, scaling operations, and health checks - would not have worked reliably. **How Was This Possible?** The root cause was a combination of three factors that compounded each other: * Excessive data volume from a small number of clusters: A subset of clusters were storing unusually large amounts of data in the shared control plane database - primarily security scanning reports and policy audit records, alongside high volumes of event and autoscaling objects. Three clusters alone accounted for approximately 64% of all stored data on one of the affected database instances, putting significant pressure on shared resources. * Database maintenance frequency: Our control plane databases were configured to compact and reclaim unused space, given the high rate of data being written by the affected clusters, this interval was insufficient to keep up, causing the database to accumulate excessive historical revisions and grow beyond normal operating size. * Database fragmentation: As the database grew, the actual allocated size exceeded the amount of data actively in use - adding roughly 40% overhead. This pushed the database toward its hard storage limit, at which point it would have become read-only and caused a full control plane failure for all clusters on that instance. Together, these factors caused periodic stalls during database maintenance cycles, which temporarily disrupted the Kubernetes API server and made affected control planes unavailable for several minutes each time. **What We Are Doing to Prevent Recurrence?** We have already taken immediate action and have a structured plan in place for further improvements:: * Performed defragmentation on the affected database instances, reclaiming approximately 3–4 GiB per instance and restoring healthy operating headroom \(DONE\) * Reduced the database compaction intervals on the affected instances \(DONE\) * Expanded monitoring dashboards to improve visibility into control plane health per database instance \(DONE\) * Migrating the highest-volume clusters to dedicated database backends, eliminating the noisy-neighbor risk for those workloads entirely \(DONE\) * Improving alerting to ensure incidents of this nature trigger immediate paging notifications, ensuring timely human intervention. \(DONE\) * Rolling out updated compaction interval across all shared control plane database instances \(DONE\) * Implementing automated, scheduled defragmentation jobs for all instances to prevent fragmentation from building up \(DONE\) * Rebalancing cluster distribution across database instances to reduce concentration risk \(DONE\) **Short-term \(within 4 weeks\):** Reaching out directly to customers whose clusters are generating disproportionately high data volumes to discuss workload optimization, such as using external storage backends for large security scan reports instead of storing them in the control plane database \(ONGOING\) **Medium-term \(1–3 months\):** Specification of quotas: Introducing per-tenant resource quotas and key-count limits on the shared control plane database to prevent any single cluster from impacting others **Long-term \(Q2 2026 and beyond\):** * Re-architecting the shared control plane to provide stronger isolation between customer workloads, including dedicated database instances and improved vertical scaling * Evaluating full database sharding for customers with consistently high data volumes * Conducting regular operational drills focused on database resource exhaustion and recovery to improve our response time for future incidents

Read the full incident report →

Minor March 11, 2026

Network Connectivity Issues in TXL

Detected by Pingoru
Mar 11, 2026, 06:13 PM UTC
Resolved
Mar 11, 2026, 09:19 PM UTC
Duration
3h 6m
Affected: Network
Timeline · 6 updates
  1. investigating Mar 11, 2026, 06:13 PM UTC

    We are currently investigating network connectivity issues in our TXL datacenter.

  2. identified Mar 11, 2026, 06:31 PM UTC

    In response to monitoring alerts, our network team deployed a change to stabilize the network in the affected cluster. We will post another update at 19:00 UTC—or as soon as new information becomes available.

  3. identified Mar 11, 2026, 06:54 PM UTC

    The deployed change has had positive effect. We are downgrading the impact level while continuing to monitor the cluster closely.

  4. monitoring Mar 11, 2026, 07:27 PM UTC

    We are placing the incident in a monitoring state. Our Network Team is closely monitoring the cluster and working to restore full redundancy.

  5. resolved Mar 11, 2026, 09:19 PM UTC

    We are marking this incident as resolved. Our Network Team will publish an RCA here, once it is compiled.

  6. postmortem Mar 13, 2026, 10:11 AM UTC

    **What happened?** A network control-plane failure in the TXL data center caused progressive service degradation and a partial outage for one cluster. The incident resulted in intermittent connectivity loss for a subset of customers, with traffic impact ranging from 5% to 20% during three distinct intervals on **11.03.2026**: * 05:20 – 05:25 UTC * 11:58 – 12:10 UT * 17:47 – 18:08 UTC The issue resolved automatically once the affected network devices were rebooted sequentially and additional configuration changes have been applied. The incidents prompted a series of emergency maintenance to stabilize the cluster. **How was this possible? \(Root Cause\)** The underlying cause was the exhaustion of multicast forwarding resources on the switches serving the affected topology. This trigger is technically identical to the previous incident that occurred on [03.03.2026](https://status.ionos.cloud/incidents/3pyhtpglqf43). When the forwarding table reaches its maximum capacity, the network control plane cannot program required updates into the forwarding tables of the switches. This continuous failure to push updates overwhelms the system, resulting in severe CPU overload and an out-of-memory \(OOM\) crash of the control-plane process. Without this process, the fabric is unable to maintain stable forwarding, ultimately leading to BGP session flaps and the observed loss of connectivity. **What are we doing to prevent recurrence?** As part of the measures established following the [03.03.2026 ](https://status.ionos.cloud/incidents/3pyhtpglqf43)incident, we had already rolled out configuration changes to one of the two topologies in the affected cluster. These measures aim to significantly reduce the load of the control-plane by optimizing resweep and failover times. Due to existing instabilities in the second topology, these changes had not yet been applied there. In response to the degradations observed on **11.03.2026**, we accelerated the rollout of these optimizations via Emergency Maintenance across all TXL topologies and several other data centers yesterday. Post-implementation, we observed a significant positive impact, with OpenSM loads being substantially reduced. Furthermore we’ve reconfigured management services to significantly reduce the multicast forwarding entries resulting in a significantly lower base load of the control-plane process. **Immediate Technical Actions:** * _Resource Management_: Adjusted control-plane configurations to use less aggressive failover timers and reduced heavy sweep activity, lowering the load on the forwarding table. \(DONE\) * _Multicast Load Reduction:_ Implemented measures to decrease the number of multicast group memberships, further reducing pressure on the forwarding table. \(DONE\) * _Proactive Monitoring:_ Established continuous monitoring of forwarding table utilization and control-plane memory to trigger early alerts before capacity limits are reached. \(DONE\) **Long-term Structural Improvements:** * _Control-Plane Isolation:_ We are migrating our control-plane to dedicated, high-performance servers. This work, which began in Q4 2025, ensures the separation of network management from data-plane traffic to eliminate resource competition. We are also performing a deep-dive audit of IPv6 configurations and IPoIB driver settings. \(Ongoing, ETA: Q2 2026\) * _Network Modernization:_ Our interconnect fabric is undergoing a strategic modernization program—including switches, gateways, and drivers—to increase both resiliency and performance. \(Ongoing, ETA: Q3 2026\) With these measures now in place, we are confident the immediate cause of the cluster instability has been mitigated. Our long-term strategy will further enhance network stability and scalability across all data centers while addressing the identified bottlenecks. We recognize that these disruptions have impacted our customers and partners, and we sincerely appreciate your patience regarding the short-notice emergency maintenance announced yesterday. We are continuing to monitor the cluster closely to ensure all deployed fixes remain effective.

Read the full incident report →

Minor March 9, 2026

AI Model Hub: Increased Error Rate

Detected by Pingoru
Mar 09, 2026, 08:26 AM UTC
Resolved
Mar 11, 2026, 07:20 PM UTC
Duration
2d 10h
Affected: AI Model Hub
Timeline · 4 updates
  1. investigating Mar 09, 2026, 08:26 AM UTC

    Our Model Hub Team is currently working on resolving errors related to an instance running the llama 405b model.

  2. identified Mar 09, 2026, 11:52 AM UTC

    The team has identified the root cause: hardware degradation affecting this model's hosting environment is causing backend instability. We are currently implementing a fix.

  3. monitoring Mar 09, 2026, 06:53 PM UTC

    Our AI Model Hub Team has mitigated the incident. While the underlying root cause is not yet fully established or resolved, the model service should be stable. We are monitoring the situation while the investigation is ongoing

  4. resolved Mar 11, 2026, 07:20 PM UTC

    We are marking this incident as resolved. The incident was caused by capacity constraints following a hardware failure. While capacity has been restored, we still see some usage‑specific constraints with the Llama 3.1 405B Instruct model. Our AI ModelHub team will deploy optimizations to the model to increase performance and reliability. We recommend that users still experiencing issues with the model check GPT‑OSS 120B as a potential (temporary) replacement.

Read the full incident report →

Minor March 4, 2026

IONOS Container Registry

Detected by Pingoru
Mar 04, 2026, 09:23 AM UTC
Resolved
Mar 04, 2026, 04:39 PM UTC
Duration
7h 16m
Timeline · 5 updates
  1. investigating Mar 04, 2026, 09:23 AM UTC

    We are currently investigating an increased error count on the IONOS Container Registry. Customers might be unable to pull images currently.

  2. identified Mar 04, 2026, 10:23 AM UTC

    Our Container Registry team has identified an issue in the underlying Kubernetes cluster serving a subset of images. The team is currently working on applying a fix for the issue.

  3. monitoring Mar 04, 2026, 12:22 PM UTC

    The Kubernetes Team has deployed a mitigation to the issue which involved a version rollback of a component of the K8s control plane. We are currently monitoring the service recovery.

  4. resolved Mar 04, 2026, 04:39 PM UTC

    We are marking this incident as resolved because no further issues were found in the setup. A Root‑Cause Analysis (RCA) will be published once the team has completed its analysis.

  5. postmortem Mar 11, 2026, 06:02 PM UTC

    We want to share the Root Cause Analysis to this incident: **What happened** On 4 March 2026 customers of the IONOS Container Registry experienced 504 Gateway Timeout errors when pushing or pulling container images. Deployments that relied on the registry were blocked. **How was that possible \(Root Cause\)** The registry runs on IONOS Managed Kubernetes \(MK8s\) infrastructure. A temporary capacity constraint caused two critical control‑plane components to be placed on the same proxy instance instead of distributing them across separate proxies. This happened despite existing anti-affinity rules. The shared proxy reached its maximum concurrent‑connection limit and stopped accepting new connections. Because all registry traffic to the Kubernetes API traverses this proxy, push and pull operations failed with 504 errors. The migration created the co‑location condition; the connection‑limit exhaustion was the direct trigger. **What we are doing to prevent recurrence** Immediate \(completed\) * Provisioned additional proxy capacity. * Relocated the affected control‑plane components onto separate proxy instances, restoring balanced load and ending the 504 errors. Short‑term * Architectural redesign: Redesign registry‑to‑API connectivity so each node uses a dedicated local proxy, eliminating shared‑proxy bottlenecks. Design validated in test environments; production rollout scheduled for Q2 2026. * Alert‑threshold review: Adjust alerting thresholds to trigger warnings before proxy connection utilization approaches capacity. Rollout in progress, expected completion Q2 2026. Mid‑term * Load redistribution: Deploy additional infrastructure clusters and redistribute existing registries to ensure no single cluster exceeds safe operating capacity. Automation will continuously balance load as usage grows. **Closing remarks** The outage directly impacted container‑image delivery and delayed customer deployments. We have restored full service and implemented concrete architectural and operational changes to eliminate the identified bottleneck.

Read the full incident report →

Minor March 4, 2026

DCD Frontend: Issues editing VDCs

Detected by Pingoru
Mar 04, 2026, 08:51 AM UTC
Resolved
Mar 04, 2026, 11:14 AM UTC
Duration
2h 22m
Affected: Data Center Designer (DCD)
Timeline · 4 updates
  1. investigating Mar 04, 2026, 08:51 AM UTC

    We are actively investigating reports of frontend problems when attempting to edit VDCs through the DCD UI.

  2. monitoring Mar 04, 2026, 09:54 AM UTC

    Our DCD Front‑end team has identified an issue introduced by a recent release. A rollback was performed, and customers who were unable to edit their VDCs and were confronted with an “In Progress/Pending” status should now be unblocked. We are actively monitoring the situation.

  3. resolved Mar 04, 2026, 11:14 AM UTC

    We are marking this incident as resolved. The Issue was caused by a faulty DCD release, which was rolled back by the responsible team.

  4. postmortem Mar 05, 2026, 10:45 AM UTC

    **What happened?** Users experienced the Data Center Designer \(DCD\) UI becoming unresponsive, displaying a persistent "In Progress" hourglass popup when attempting to open or edit certain Virtual Data Centers \(VDCs\). This issue prevented users from editing components or loading VDCs entirely. The problem was resolved via a rollback of the affected version and the subsequent deployment of a fixed release. **How was this possible? \(Root Cause\)** Changes introduced in a recent release - which targeted storage volume handling - inadvertently impacted CDROM volumes. While CDROM volumes are technically classified as storage volumes within the codebase, they carry different properties. The logic implemented for HDD/SSD volumes did not account for these differences, causing the UI to enter a broken state when CDROM volumes were encountered. **What are we doing to prevent recurrence?** * Immediate Fix: We have implemented dedicated handling for CDROM volume properties to correctly differentiate them from HDD/SSD volumes within the storage logic. The fixed release has been published. \(DONE\) * Improved Documentation: Documentation for CDROM entities has been updated to increase visibility and awareness for future development work involving storage volumes. \(DONE\) * Quality Assurance and Pre-Release Testing: Structural improvements are being made to ensure more extensive testing of various scenarios and user journeys. This will increase test coverage and allow for a higher detection rate of edge cases prior to release. \(Ongoing – Q2 2026\)

Read the full incident report →

Major March 3, 2026

Network Service Degradation in TXL

Detected by Pingoru
Mar 03, 2026, 11:36 AM UTC
Resolved
Mar 03, 2026, 09:49 PM UTC
Duration
10h 12m
Affected: Data Center Designer (DCD)Database as a Service (DBaaS)NetworkAI Model HubProvisioning
Timeline · 17 updates
  1. investigating Mar 03, 2026, 11:36 AM UTC

    We are currently investigating monitoring alerts in TXL. We will keep you updated on our investigation.

  2. identified Mar 03, 2026, 11:53 AM UTC

    We have identified an issue related to the ongoing maintenance: https://status.ionos.cloud/incidents/zg4mpk9x724t Our network team is currently working on restoring service to the affected components. Customers might experience intermittent network service degradation or outages. We are upgrading the severity of the incident and will keep you informed on the progress of the recovery.

  3. identified Mar 03, 2026, 12:01 PM UTC

    We have added DCD and AI Modelhub to the list of affected services. Customers might experience connectivity issues for these services currently.

  4. monitoring Mar 03, 2026, 12:09 PM UTC

    We are seeing services recovering. We will monitor the progress over the next few minutes and update the status of the affected services.

  5. identified Mar 03, 2026, 12:38 PM UTC

    We are seeing issues again on DCD frontend, as well as AI Model Hub. Customers may still see connectivity issues with these services. Our Services Team are investigating.

  6. identified Mar 03, 2026, 12:45 PM UTC

    We have added DBaaS in the list of affected services due to increased error count. The team is aware and actively working on the service.

  7. identified Mar 03, 2026, 12:54 PM UTC

    We are temporarily pausing provisioning in a cluster currently experiencing a backlog of queued jobs. While running services remain unaffected, updates to existing entities will not be processed until provisioning is reactivated.

  8. identified Mar 03, 2026, 01:24 PM UTC

    We are upgrading the impact for the AI Model Hub.

  9. identified Mar 03, 2026, 01:39 PM UTC

    Our network team is still diagnosing ongoing connectivity issues that affect the referenced services.

  10. identified Mar 03, 2026, 02:01 PM UTC

    Network connectivity is improving. We will re-enable provisioning and monitor job execution as well as the rest of the affected services.

  11. identified Mar 03, 2026, 02:20 PM UTC

    We see that AI Model Hub service is recovering. We are closely monitoring request processing and reducing the severity of the impact for this service.

  12. identified Mar 03, 2026, 02:33 PM UTC

    We are currently working on re-establishing connectivity for the supporting systems for the DCD to restore the service, as well.

  13. identified Mar 03, 2026, 02:37 PM UTC

    We are setting AI Model Hub back to "operational"

  14. identified Mar 03, 2026, 03:16 PM UTC

    We are still waiting for the provisioning backlog to be processed. Customers might face extended provisioning times. DCD web-frontend availability and DBaaS services remain affected due to their dependency on provisioning assignments. We will continue to update the status page, though likely at a reduced cadence as we monitor backlog consumption. In parallel, a Root Cause Analysis (RCA) has been initiated for the triggering incident. We will share the RCA here as soon as it becomes available.

  15. monitoring Mar 03, 2026, 03:25 PM UTC

    We are setting the DCD frontend back to 'Operational' now that the dependency on the provisioning side has been resolved. Customers should now be able to use the DCD via the web frontend again. We are setting the incident to 'Monitoring' status and will ensure the successful recovery of the still affected services. Customers might still experience a performance impact on provisioning-related activities, such as modifying infrastructure resources in the DCD. We recommend to postpone non-mission critical/urgent changes until the incident is marked as "Resolved"

  16. resolved Mar 03, 2026, 09:49 PM UTC

    The Provisioning and DBaaS services have been restored to full operational status. The backlog has been cleared and no further service degradation is expected. Our Network Team is currently preparing a comprehensive Root Cause Analysis (RCA) of this incident, which we will publish on this page. We anticipate releasing the complete analysis by tomorrow. Thank you for your patience.

  17. postmortem Mar 09, 2026, 03:52 PM UTC

    # **Root Cause Analysis** Today we are releasing a preliminary Root Cause Analysis that represents the current state of the investigation focusing on the network incident. While we have high confidence in the technical details concerning the network incident, we will add to this analysis as new information becomes available related to the services affected by the spillover effects of this main incident. We expect to publish the complete RCA by the end of this week. _UPDATE 13.03.2026 - We wanted to provide an update to the published preliminary RCA to enrich the findings and increase transparency related to other affected services._ ## **What happened?** During a [scheduled maintenance window](https://status.ionos.cloud/incidents/zg4mpk9x724t) to replace a faulty switch in a cluster situated in the TXL data center, a network outage occurred. The incident triggered a spillover effect, resulting in a significant queue of unprocessed provisioning jobs from the affected cluster. Following network recovery, this backlog generated significant resource locking delays, slowing processing of queued jobs. This impacted systems and services relying on self-healing automations triggered by the primary incident or undergoing changes at that time. _UPDATE: In the following we want to explain in more details the impact on other affected services:_ **IAM:** For customers, the DCD frontend was unavailable during the incident. This was due to the fact that the underlying IAM service was not reachable for the frontend. During the incident, an IP assignment maintenance job had been triggered, which was pending completion for an extended period of time, rendering the service unavailable to the frontend. **AI Model Hub:** The AI Model hub service was affected directly by the network outage. While the service itself remained functional throughout the incident, a network connectivity loss on the Managed Kubernetes Cluster meant that it was unavailable for customers. **DBaaS/Managed Kubernetes:** DBaaS and Managed Kubernetes were impacted through two separate mechanisms. First, servers hosting DBaaS and Kubernetes workloads experienced a direct loss of BGP connectivity as a result of the network outage, causing immediate service disruption. Second, the resulting delays in provisioning queue processing caused key operations such as volume attach/detach activities as well as automatically triggered self-healing mechanisms to be deferred, leading to further delays during the recovery phase when network connectivity was restored. **Provisioning:** Provisioning was first directly impacted by the connectivity loss to resources on the affected cluster. This led to an initial spike in the processing queue jobs. As services affected by the connectivity issues entered self-healing additional jobs were placed in the queue. This led to an exponential increase in job numbers, which, after the network incident was resolved, led to resource locking bottlenecks, which reduced the normal processing speed of the queue. This hampered automated recovering mechanisms, leading to extended service degradations, especially for IAM and Managed Kubernetes. ## **How was this possible? \(Technical Root Cause\)** The network maintenance took place in Topology 2 of 2 in the affected cluster. During the maintenance, Topology 2 of 2 was disabled as planned. At this point, the Multicast Forwarding Table \(MFT\) capacity across all switches in Topology 1 was already strained due to saturation from IPoIB multicast groups. Due to this, Topology 1 was already operating in a vulnerable state. Topology 1 of 2 then suffered a critical gateway overload, initiated by a link flap on a switch which flooded the InfiniBand control plane with alerts, triggering a fabric resweep - an automated self-healing mechanism aiming to reset the network map. Because the MFT was already at capacity, the reprogramming required for the reset failed. The resulting surge of unsuccessful route updates overwhelmed the management layer, causing BGP sessions to drop and ultimately severing connectivity for hosts in the affected cluster. As the connectivity loss made it impossible for provisioning jobs to succeed in the affected cluster all changes scheduled for resources on the affected hosts were queued. The network incident was resolved by a switchover from the affected Control Plane to its standby, a reboot of the locked-up gateways, and recovery of Topology 1 from the scheduled maintenance. Although the network incident was resolved, the resulting volume of provisioning jobs triggered significant resource locking. This bottleneck delayed critical updates, leaving several \(self-healing\) services in a degraded state as they waited for their queued jobs to process. ## **What are we doing to prevent recurrence?** Today, we are highlighting the measures - both planned and currently deployed - designed to reduce the likelihood of a similar incident. We are listing these measures by affected service, but want to underline that the “interconnectedness” of the services through the provisioning queue was identified as a key contributor to the duration of the service degradation. While individual measures will make the services individually more robust, we are also implementing architectural changes to the provisioning queue to ensure better de-coupling and reduce the risk of spillovers as observed here. ### Network In an RCA released for a related incident on [11.03.2026](https://manage.statuspage.io/pages/xdhr50sc5fkm/incidents/0rmsrc0dnk5h#postmortem) triggered by the same technical root cause, we have published immediate technical actions and long-term structural improvements. To improve consistency, we decided to share these measures and their implementation status here, as well. Immediate Technical Actions: * Resource Management: Adjusted control-plane configurations to use less aggressive failover timers and reduced heavy sweep activity, lowering the load on the forwarding table. \(DONE\) * Multicast Load Reduction: Implemented measures to decrease the number of multicast group memberships, further reducing pressure on the forwarding table. \(DONE\) * Proactive Monitoring: Established continuous monitoring of forwarding table utilization and control-plane memory to trigger early alerts before capacity limits are reached. \(DONE\) Long-term Structural Improvements: * Control-Plane Isolation: We are migrating our control-plane to dedicated, high-performance servers. This work, which began in Q4 2025, ensures the separation of network management from data-plane traffic to eliminate resource competition. We are also performing a deep-dive audit of IPv6 configurations and IPoIB driver settings. \(Ongoing, ETA: Q2 2026\) * Network Modernization: Our interconnect fabric is undergoing a strategic modernization program - including switches, gateways, and drivers - to increase both resiliency and performance. \(Ongoing, ETA: Q3 2026\) With these measures now in place, we are confident the immediate cause of the cluster instability has been mitigated. Our long-term strategy will further enhance network stability and scalability across all data centers addressing the identified bottlenecks ### IAM Immediate Technical Actions: IAM Service Resilience during Maintenance: We have hardened IAM maintenance protocols to ensure critical identity services remain functional even if primary Cloud APIs are temporarily unavailable. \(DONE\) Long-term Structural Improvements: Hardening Core Identity Services \(IAM\): We are expediting the running migration of our Identity and Access Management \(IAM\) system to a dedicated platform. This removes IAM's dependency on the standard managed Kubernetes clusters \(Q3 2026\). ### AI Modelhub As the AI Modelhub was affected by connectivity issues caused by issues in the underlying Managed Kubernetes setup, the service will directly benefit from the measures planned to increase Managed Kubernetes resilience. ### DBaaS/Managed Kubernetes Improvements to Management and Provisioning Handling: Structural improvements are planned that make cluster setup and management more seamless, reliable, and automated. This will minimize the surface for service disruptions during updates and changes, like those triggered by self-healing mechanisms. Key improvements include better handling of node maintenance, more predictable scaling, and enhanced stability for production workloads. These changes are part of a broader effort to improve the Kubernetes experience, but will also help address the specific issues relevant in this incident by introducing a dedicated provisioning provider. \(Q3 2026\) ### Provisioning Enhanced Provisioning Resilience: To prevent "logjams" during maintenance or incidents, we are introducing automated circuit breakers within our provisioning engine to reduce the risk for overloads and perpetual queue buildup caused by self-healing services. Several quality of service improvements are planned to improve visibility and control over job execution and prioritization within the provisioning queue. Additional steps towards preparing a de-coupling of provisioning queues will be made in an upcoming provisioning maintenance to test provisioning switchover from one DC to another. \(Q1 2026\) We hope this update to our preliminary RCA increases transparency regarding how the network incident in the affected cluster unfolded and how it influenced other services. We have derived a series of improvements from this event that will help make our services more resilient in the future. While our ongoing network modernization initiative will provide greater performance and stability, we also aim to reduce the 'blast radius' of future incidents by identifying and addressing dependencies within our services. We recognize that this incident has affected customers and partners in various ways. It is important to us to provide a comprehensive, transparent account of the disruption, as well as the initiatives we have implemented - and will continue to put into place - to help avoid similar issues moving forward.

Read the full incident report →

Minor March 2, 2026

Kubernetes service restricted

Detected by Pingoru
Mar 02, 2026, 12:03 PM UTC
Resolved
Mar 02, 2026, 02:55 PM UTC
Duration
2h 52m
Affected: Managed Kubernetes
Timeline · 3 updates
  1. identified Mar 02, 2026, 12:03 PM UTC

    We are currently experiencing instability affecting Kubernetes clusters hosted in Frankfurt. Some clusters are stuck in an updating/deploying state, which is causing intermittent communication issues between control plane components and worker nodes. The issue may cause brief interruptions or degraded performance across impacted services. Our engineering teams are actively investigating and working to restore full stability as quickly as possible.

  2. resolved Mar 02, 2026, 02:55 PM UTC

    This incident has been resolved.

  3. postmortem Mar 06, 2026, 04:11 PM UTC

    **Incident Summary** On March 2, 2026, some Managed Kubernetes clusters in our Frankfurt \(DE/FRA\) region became stuck in an `UPDATING` or `DEPLOYING` state. This caused intermittent communication issues between control plane components and worker nodes. **Root Cause** The instability was caused by our control plane management system running out of memory and crashing under high load. When the system restarted, it was unable to properly resume its interrupted tasks, which left several clusters stuck mid-update. **Resolution** Our engineering teams quickly stabilized the system by increasing its memory allocation. Once stable, engineers manually recovered the remaining clusters that were stuck to restore full service. **Prevention** To prevent this from happening again, we are working to address an upstream software bug related to how the system handles memory-related crashes. Additionally, we are fixing an internal alerting issue that failed to notify our on-call team, ensuring a much faster response should a similar issue arise in the future.

Read the full incident report →

Minor February 26, 2026

Acronis Backup Service: Cloud Storage Access issues

Detected by Pingoru
Feb 26, 2026, 07:56 PM UTC
Resolved
Feb 27, 2026, 08:25 AM UTC
Duration
12h 29m
Affected: Backup Service
Timeline · 2 updates
  1. investigating Feb 26, 2026, 07:56 PM UTC

    We are currently investigating issues related to Acronis Backup Service. Error message: Failed to connect because access to the cloud storage was denied.

  2. resolved Feb 27, 2026, 08:25 AM UTC

    This incident is now closed. Analysis shows that a configuration change to Acronis Backup on 25.02 briefly affected connectivity, leading to delayed reporting from customers. The change was rolled back shortly after maintenance, resolving the issue.

Read the full incident report →

Major February 24, 2026

Network service degraded

Detected by Pingoru
Feb 24, 2026, 08:39 PM UTC
Resolved
Feb 24, 2026, 09:44 PM UTC
Duration
1h 5m
Affected: Network
Timeline · 7 updates
  1. investigating Feb 24, 2026, 08:39 PM UTC

    We are writing to inform you that we have been experiencing sporadic connection issues and substantial delays on packet delivery. Network technicians have started working on the occurrence immediately after detection and will isolate the problem and solve the issue as quick as possible. However, it is possible that there will be a certain degradation in connection quality affecting individual virtual resources. We will inform you as soon as the functionality has been restored.

  2. investigating Feb 24, 2026, 08:40 PM UTC

    We are continuing to investigate this issue.

  3. investigating Feb 24, 2026, 08:53 PM UTC

    We are continuing to investigate this issue.

  4. identified Feb 24, 2026, 09:00 PM UTC

    The team has now Identified the source and are investigating remediation steps.

  5. monitoring Feb 24, 2026, 09:05 PM UTC

    A fix Has been implemented and Customers should see traffic and connectivity return to normal at this point.

  6. resolved Feb 24, 2026, 09:44 PM UTC

    The issue was linked to the scheduled network maintenance announced here: https://status.ionos.cloud/incidents/y3q7703x5fg4. We have confirmed that the problem is now resolved. Our team is continuing the investigation to determine the root cause, and we will publish the RCA on this page as soon as it becomes available.

  7. postmortem Feb 25, 2026, 11:54 AM UTC

    We want to share the following Root Cause Analysis with you _Update: 25.02.2026 - Initial publishing_ **What happened?** On February 24, 2026, during a scheduled maintenance window intended to improve network resiliency in the FRA region \(\[[Link](https://status.ionos.cloud/incidents/y3q7703x5fg4)\]\), a configuration deployment led to a loss of connectivity for public services. Internal monitoring detected the outage shortly after the start of the announced maintenance, once the configuration change was applied. Our network engineering teams identified link failures between critical hardware components. By 20:55 UTC, a manual fix was applied to the affected devices, and initially affected services were restored by 21:00 UTC. **How could this happen? \(Root Cause\)** The incident was caused by a configuration state drift between our central software repository and the live hardware settings in the FR7 production environment that went undetected before the rollout. Specifically, the outage involved Forward Error Correction \(FEC\) settings—parameters that allow different brands of networking hardware to communicate reliably. * **The Discrepancy:** Unlike the staging environments, the production environment had unique configurations for settings required for multi-vendor hardware interoperability. Despite using a "four-eyes" principle to validate the configuration changes, the rendered output did not provide enough visibility into the unexpected discrepancy. This caused the difference to go unnoticed. * **The Trigger:** The automated deployment performed a "full rebuild" of the device configuration. Because the repository did not contain the specific FEC settings \(the discrepancy\), it omitted them during the rebuild. * **The Result:** Once the new configuration was pushed, the lack of FEC parity caused the physical links between mismatched devices to fail, dropping traffic for all public services in the region. **What are we doing to prevent recurrence?** We are committed to ensuring this specific failure mode does not happen again. Our engineering teams have initiated the following corrective actions: * **Comprehensive Configuration Audit:** We are performing a full audit of all production devices to identify and resolve any "drifts" where live settings \(like FEC\) differ from our central repository. \(to be completed within Q1 2026\) * **Improved Validation Checks:** We are implementing an automated pre-flight check that compares the "intended" configuration against the "running" configuration to clearly flag any potential omissions before a change is finalized. This increases the visibility of drifts and unexpected discrepancies and reduces the surface area for human error. \(to be completed within Q1 2026\) The scheduled maintenance was initially planned to have only a few seconds of service disruption; however, due to the issues described, it caused a significantly higher impact. This maintenance was part of our ongoing initiative to improve stability and performance in our data centers. Our network team remains committed to driving this initiative forward. We understand the impact that this incident has caused and are working with due diligence and urgency to incorporate the lessons learned to further reduce risks during maintenance operations on our core network components.

Read the full incident report →

Minor February 6, 2026

IAM Service - Increased Error Rate

Detected by Pingoru
Feb 06, 2026, 11:41 AM UTC
Resolved
Feb 06, 2026, 04:43 PM UTC
Duration
5h 2m
Affected: Data Center Designer (DCD)
Timeline · 3 updates
  1. investigating Feb 06, 2026, 11:41 AM UTC

    We are currently investigating increased error rates in our IAM Service. We will update this incident shortly with more information

  2. monitoring Feb 06, 2026, 11:46 AM UTC

    The root cause could be identified quickly and the Infrastructure team reports that the issue should be resolved. We are monitoring the situation and will then resolve the incident.

  3. resolved Feb 06, 2026, 04:43 PM UTC

    We marking this incident as fully resolved

Read the full incident report →

Major February 4, 2026

AI Model Hub - Increased error rate

Detected by Pingoru
Feb 04, 2026, 06:12 PM UTC
Resolved
Feb 05, 2026, 07:53 AM UTC
Duration
13h 41m
Affected: AI Model Hub
Timeline · 3 updates
  1. investigating Feb 04, 2026, 06:12 PM UTC

    We are currently investigating an increase in error rate for our AI Model Hub

  2. monitoring Feb 04, 2026, 06:52 PM UTC

    We are seeing that error rate is dropping and are downgrading the incident. We are actively monitoring the situation further.

  3. resolved Feb 05, 2026, 07:53 AM UTC

    We are marking this incident as resolved as the metrics remained stable.

Read the full incident report →

Minor February 4, 2026

Provisioning service access limited

Detected by Pingoru
Feb 04, 2026, 11:13 AM UTC
Resolved
Feb 04, 2026, 03:30 PM UTC
Duration
4h 16m
Affected: Data Center Designer (DCD)Cloud API
Timeline · 3 updates
  1. investigating Feb 04, 2026, 11:13 AM UTC

    There is currently an increased processing time for provisioning requests initiated via the Data Center Designer or the API. The availability and accessibility of your virtual datacenter resources will remain unaffected.

  2. monitoring Feb 04, 2026, 12:43 PM UTC

    Processing time for provisioning requests has returned to normal. The service is currently being monitored

  3. resolved Feb 04, 2026, 03:30 PM UTC

    This incident has been resolved.

Read the full incident report →

Major February 3, 2026

Service Disruption: Billing API

Detected by Pingoru
Feb 03, 2026, 11:38 AM UTC
Resolved
Feb 05, 2026, 05:49 PM UTC
Duration
2d 6h
Affected: Billing API
Timeline · 2 updates
  1. identified Feb 03, 2026, 11:38 AM UTC

    A technical issue affecting invoice generation has been identified. We are currently implementing a resolution to restore service to the invoice endpoint. Access to these records is temporarily suspended.

  2. resolved Feb 05, 2026, 05:49 PM UTC

    We have received an update from our Billing team. The API is fully operational again and we are marking this incident as resolved.

Read the full incident report →

Looking to track IONOS Cloud downtime and outages?

Pingoru polls IONOS Cloud's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when IONOS Cloud reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track IONOS Cloud alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring IONOS Cloud for free

5 free monitors · No credit card required