IONOS Cloud incident

AI Model Hub - Service Degradations

IONOS Cloud is currently experiencing a minor incident affecting AI Model Hub, which began 1h ago. The vendor's full update timeline is below.

Started
Jun 04, 2026, 02:34 PM UTC
Resolved
Ongoing
Duration
● 1h 3m
Detected by Pingoru
Jun 04, 2026, 02:34 PM UTC

Affected components

AI Model Hub

Update timeline

  1. identified Jun 04, 2026, 02:34 PM UTC

    The AI Model Hub is currently experiencing high demand. As a shared infrastructure environment, this temporary surge in utilization may result in increased latency and varied performance across the platform. Current Impacts - Increased Latency (TTFT): High GPU utilization may cause delayed responses (Time to First Token) when initiating model requests. We are continuously managing platform resources to maximize availability across all concurrent workloads. - Llama 405B Constraints: Due to its scale and high compute requirements, the Llama 405B model is particularly sensitive to traffic spikes, leading to potential timeouts or higher latency—we recommend switching to alternative models for time-critical workloads requiring faster response times. - Collections: Collections can also be influenced negatively by high demand and utilization. Please see the notes on new deployments in our documentation: https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/document-collections Recommendations: - To mitigate temporarily performance or availability issues, we encourage our customers and partners to implement (exponential backoff) retry mechanisms in their projects and pipelines. Usage peaks that lead in timeouts and degraded performance are usually transient. - We kindly ask our customers to not create additional Support Tickets for questions and reports related to the performance of models in the Model Hub. Our Product and Tech Teams are aware and are actively monitoring and working on improving the service to meet demand. Outlook The following measures are currently underway to improve the performance of the AI Modelhub: - Various (model specific) optimizations are rolled out on a regular basis - Improved cross-GPU load balancing (ETA July) - Adjustments to error codes (replacing HTTP 5xx with HTTP 429) to allow clients to better understand and react to situations