IONOS Cloud incident

Network Connectivity Issues in TXL

Minor Resolved View vendor source →

Started: Mar 11, 2026, 06:13 PM UTC
Resolved: Mar 11, 2026, 09:19 PM UTC
Duration: 3h 6m
Detected by Pingoru: Mar 11, 2026, 06:13 PM UTC

Affected components

Network

Update timeline

investigating Mar 11, 2026, 06:13 PM UTC

We are currently investigating network connectivity issues in our TXL datacenter.
identified Mar 11, 2026, 06:31 PM UTC

In response to monitoring alerts, our network team deployed a change to stabilize the network in the affected cluster. We will post another update at 19:00 UTC—or as soon as new information becomes available.
identified Mar 11, 2026, 06:54 PM UTC

The deployed change has had positive effect. We are downgrading the impact level while continuing to monitor the cluster closely.
monitoring Mar 11, 2026, 07:27 PM UTC

We are placing the incident in a monitoring state. Our Network Team is closely monitoring the cluster and working to restore full redundancy.
resolved Mar 11, 2026, 09:19 PM UTC

We are marking this incident as resolved. Our Network Team will publish an RCA here, once it is compiled.
postmortem Mar 13, 2026, 10:11 AM UTC

**What happened?** A network control-plane failure in the TXL data center caused progressive service degradation and a partial outage for one cluster. The incident resulted in intermittent connectivity loss for a subset of customers, with traffic impact ranging from 5% to 20% during three distinct intervals on **11.03.2026**: * 05:20 – 05:25 UTC * 11:58 – 12:10 UT * 17:47 – 18:08 UTC The issue resolved automatically once the affected network devices were rebooted sequentially and additional configuration changes have been applied. The incidents prompted a series of emergency maintenance to stabilize the cluster. **How was this possible? \(Root Cause\)** The underlying cause was the exhaustion of multicast forwarding resources on the switches serving the affected topology. This trigger is technically identical to the previous incident that occurred on [03.03.2026](https://status.ionos.cloud/incidents/3pyhtpglqf43). When the forwarding table reaches its maximum capacity, the network control plane cannot program required updates into the forwarding tables of the switches. This continuous failure to push updates overwhelms the system, resulting in severe CPU overload and an out-of-memory \(OOM\) crash of the control-plane process. Without this process, the fabric is unable to maintain stable forwarding, ultimately leading to BGP session flaps and the observed loss of connectivity. **What are we doing to prevent recurrence?** As part of the measures established following the [03.03.2026 ](https://status.ionos.cloud/incidents/3pyhtpglqf43)incident, we had already rolled out configuration changes to one of the two topologies in the affected cluster. These measures aim to significantly reduce the load of the control-plane by optimizing resweep and failover times. Due to existing instabilities in the second topology, these changes had not yet been applied there. In response to the degradations observed on **11.03.2026**, we accelerated the rollout of these optimizations via Emergency Maintenance across all TXL topologies and several other data centers yesterday. Post-implementation, we observed a significant positive impact, with OpenSM loads being substantially reduced. Furthermore we’ve reconfigured management services to significantly reduce the multicast forwarding entries resulting in a significantly lower base load of the control-plane process. **Immediate Technical Actions:** * _Resource Management_: Adjusted control-plane configurations to use less aggressive failover timers and reduced heavy sweep activity, lowering the load on the forwarding table. \(DONE\) * _Multicast Load Reduction:_ Implemented measures to decrease the number of multicast group memberships, further reducing pressure on the forwarding table. \(DONE\) * _Proactive Monitoring:_ Established continuous monitoring of forwarding table utilization and control-plane memory to trigger early alerts before capacity limits are reached. \(DONE\) **Long-term Structural Improvements:** * _Control-Plane Isolation:_ We are migrating our control-plane to dedicated, high-performance servers. This work, which began in Q4 2025, ensures the separation of network management from data-plane traffic to eliminate resource competition. We are also performing a deep-dive audit of IPv6 configurations and IPoIB driver settings. \(Ongoing, ETA: Q2 2026\) * _Network Modernization:_ Our interconnect fabric is undergoing a strategic modernization program—including switches, gateways, and drivers—to increase both resiliency and performance. \(Ongoing, ETA: Q3 2026\) With these measures now in place, we are confident the immediate cause of the cluster instability has been mitigated. Our long-term strategy will further enhance network stability and scalability across all data centers while addressing the identified bottlenecks. We recognize that these disruptions have impacted our customers and partners, and we sincerely appreciate your patience regarding the short-notice emergency maintenance announced yesterday. We are continuing to monitor the cluster closely to ensure all deployed fixes remain effective.

Looking to track IONOS Cloud downtime and outages?

Pingoru polls IONOS Cloud's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when IONOS Cloud reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track IONOS Cloud alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring IONOS Cloud for free

5 free monitors · No credit card required