DataRobot incident

Network issue related to Kubernetes in US cluster

Major Resolved View vendor source →

DataRobot experienced a major incident on March 11, 2026 affecting Predictions and MLOps and 1 more component, lasting 2h 5m. The incident has been resolved; the full update timeline is below.

Started
Mar 11, 2026, 02:36 PM UTC
Resolved
Mar 11, 2026, 04:41 PM UTC
Duration
2h 5m
Detected by Pingoru
Mar 11, 2026, 02:36 PM UTC

Affected components

PredictionsMLOpsPipeline

Update timeline

  1. investigating Mar 11, 2026, 02:36 PM UTC

    DataRobot is experiencing network issue related to Kubernetes in US Cluster. This will have impact on model deployment and predictions. Engineering is investigating the root cause.

  2. identified Mar 11, 2026, 03:06 PM UTC

    Engineering has identified the root cause of the problem and a mitigation is put in place.

  3. monitoring Mar 11, 2026, 03:19 PM UTC

    The mitigation implemented by Engineering has improved the network issue. The team is continuing to monitor the environment to ensure full recovery.

  4. resolved Mar 11, 2026, 04:41 PM UTC

    The mitigation implemented by Engineering has resolved the Kubernetes network issue, and the incident is now contained.