DataRobot incident
Widespread intermittent service issues for new workloads in US Production
DataRobot experienced a minor incident on May 8, 2026 affecting AI Apps and MLOps and 1 more component, lasting 20h 5m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring May 08, 2026, 03:53 AM UTC
We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Notebooks, Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.
- identified May 08, 2026, 05:47 AM UTC
We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.
- identified May 08, 2026, 06:13 AM UTC
We are continuing to experience issues launching new workloads for Custom Models and Custom Applications in US Production. This is connected to an ongoing AWS outage. Our team is exploring multiple mitigation options.
- monitoring May 08, 2026, 01:40 PM UTC
Engineering resolved the underlying issue with workload scheduling and is monitoring the cluster.
- resolved May 08, 2026, 08:21 PM UTC
Engineering confirmed the issue is resolved and all services are restored.