DataRobot Outage History

DataRobot is up right now

DataRobot had 32 outages in the last 2 years totaling 1085h 33m of downtime — averaging 1.3 incidents per month.

There were 32 DataRobot outages since June 3, 2025 totaling 1085h 33m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.datarobot.com

Minor May 28, 2026

Delay in Feature Drift Statistics Processing

Detected by Pingoru
May 28, 2026, 08:30 AM UTC
Resolved
May 28, 2026, 09:47 AM UTC
Duration
1h 17m
Affected: MLOps
Timeline · 2 updates
  1. identified May 28, 2026, 09:09 AM UTC

    Feature drift statistics are currently experiencing a processing delay of approximately 1 hour. No data has been lost, and all metrics will reflect accurate values once processing catches up. Our team is actively working to resolve this.

  2. resolved May 28, 2026, 09:47 AM UTC

    Feature drift statistics processing has been restored to normal. All metrics are now up to date. No data was lost.

Read the full incident report →

Minor May 26, 2026

Customers Experiencing Errors with New Custom Model Creation.

Detected by Pingoru
May 26, 2026, 05:09 AM UTC
Resolved
May 26, 2026, 06:18 AM UTC
Duration
1h 9m
Affected: APIAutoMLAI Catalog and Data Ingest
Timeline · 4 updates
  1. investigating May 26, 2026, 05:09 AM UTC

    We are experiencing a service interruption with Custom Models functionality in US SAAS environment. Predictions to existing deployments are working fine, but users cannot create new custom models. Engineering is investigating the issue and will provide updates as we make further progress.

  2. monitoring May 26, 2026, 06:01 AM UTC

    Engineering has applied a fix in the US SAAS environment which resolved the issue. At the time of issue, some users might have experienced issues with Custom Apps, Data upload and custom model creation. The issue is contained.

  3. monitoring May 26, 2026, 06:04 AM UTC

    We are continuing to monitor for any further issues.

  4. resolved May 26, 2026, 06:18 AM UTC

    Custom Models, Custom Applications & Data upload services are back to Operational state in US SAAS Environment. Issue is Resolved.

Read the full incident report →

Minor May 8, 2026

Widespread intermittent service issues for new workloads in US Production

Detected by Pingoru
May 08, 2026, 12:15 AM UTC
Resolved
May 08, 2026, 08:21 PM UTC
Duration
20h 5m
Affected: AI AppsMLOpsNotebooks
Timeline · 5 updates
  1. monitoring May 08, 2026, 03:53 AM UTC

    We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Notebooks, Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.

  2. identified May 08, 2026, 05:47 AM UTC

    We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.

  3. identified May 08, 2026, 06:13 AM UTC

    We are continuing to experience issues launching new workloads for Custom Models and Custom Applications in US Production. This is connected to an ongoing AWS outage. Our team is exploring multiple mitigation options.

  4. monitoring May 08, 2026, 01:40 PM UTC

    Engineering resolved the underlying issue with workload scheduling and is monitoring the cluster.

  5. resolved May 08, 2026, 08:21 PM UTC

    Engineering confirmed the issue is resolved and all services are restored.

Read the full incident report →

Minor April 10, 2026

Delay in processing actual messages

Detected by Pingoru
Apr 10, 2026, 10:24 AM UTC
Resolved
Apr 10, 2026, 11:01 AM UTC
Duration
36m
Affected: MLOps
Timeline · 2 updates
  1. monitoring Apr 10, 2026, 10:24 AM UTC

    Processing actual messages on JP MTS is delayed due to autoscaling malfunction. Engineering scaled up the deployment to alleviate the issue. Root cause mitigation in progress

  2. resolved Apr 10, 2026, 11:01 AM UTC

    Engineering has applied the required infrastructure configuration changes. The service is operating normally and no further user impact is observed. Engineering will continue monitoring cluster health to ensure stability. The incident is now marked as Contained.

Read the full incident report →

Major April 9, 2026

Elevated Errors on Managed AI Cloud

Detected by Pingoru
Apr 09, 2026, 09:56 PM UTC
Resolved
Apr 10, 2026, 01:04 PM UTC
Duration
15h 7m
Affected: AI AppsNotebooks
Timeline · 4 updates
  1. investigating Apr 09, 2026, 09:56 PM UTC

    We're experiencing an elevated level of errors and are currently looking into the issue.

  2. monitoring Apr 09, 2026, 10:56 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Apr 10, 2026, 11:31 AM UTC

    Engineering has applied changes to mitigate the elevated error rates. Services are now operating normally. We are continuing to monitor the system while investigating the cause of the issue.

  4. resolved Apr 10, 2026, 01:04 PM UTC

    Engineering has implemented the required fixes to resolve the elevated error rates. Services are now operating normally, and no further user impact has been observed. The team will continue to monitor the system to ensure stability. The incident is now considered contained.

Read the full incident report →

Minor March 30, 2026

Degraded Performance on DataRobot MTS due to Quay outage

Detected by Pingoru
Mar 30, 2026, 08:43 PM UTC
Resolved
Mar 31, 2026, 08:53 AM UTC
Duration
12h 10m
Affected: WebsiteWebsiteWebsiteAPIAPIAPIPredictionsPredictionsPredictionAutoMLAutoMLAutoMLAI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data IngestAI AppsAI AppsAI AppsMLOpsMLOpsMLOpsPipelineGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 3 updates
  1. identified Mar 30, 2026, 08:43 PM UTC

    Our engineering team has found the the Quay outage currently happening is causing degraded performance across the DataRobot platform. Engineering is currently monitoring the situation.

  2. identified Mar 30, 2026, 08:44 PM UTC

    We are continuing to work on a fix for this issue.

  3. resolved Mar 31, 2026, 08:53 AM UTC

    Quay.io functionality has been restored and DataRobot environments are fully stabilized.

Read the full incident report →

Minor March 13, 2026

Performance Degradation on Managed AI Cloud

Detected by Pingoru
Mar 13, 2026, 05:49 PM UTC
Resolved
Mar 13, 2026, 06:43 PM UTC
Duration
54m
Affected: API
Timeline · 3 updates
  1. investigating Mar 13, 2026, 05:49 PM UTC

    We are experiencing performance degradation on Managed AI Cloud.

  2. monitoring Mar 13, 2026, 06:31 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Mar 13, 2026, 06:43 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor March 11, 2026

Intermittent UI disruptions on Managed AI Cloud

Detected by Pingoru
Mar 11, 2026, 08:05 PM UTC
Resolved
Mar 17, 2026, 06:32 PM UTC
Duration
5d 22h
Affected: WebsiteAPI
Timeline · 4 updates
  1. investigating Mar 11, 2026, 08:05 PM UTC

    We are currently investigating this issue.

  2. monitoring Mar 11, 2026, 08:22 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. monitoring Mar 17, 2026, 06:31 PM UTC

    We are continuing to monitor for any further issues.

  4. resolved Mar 17, 2026, 06:32 PM UTC

    This incident has been resolved.

Read the full incident report →

Major March 11, 2026

Network issue related to Kubernetes in US cluster

Detected by Pingoru
Mar 11, 2026, 02:36 PM UTC
Resolved
Mar 11, 2026, 04:41 PM UTC
Duration
2h 5m
Affected: PredictionsMLOpsPipeline
Timeline · 4 updates
  1. investigating Mar 11, 2026, 02:36 PM UTC

    DataRobot is experiencing network issue related to Kubernetes in US Cluster. This will have impact on model deployment and predictions. Engineering is investigating the root cause.

  2. identified Mar 11, 2026, 03:06 PM UTC

    Engineering has identified the root cause of the problem and a mitigation is put in place.

  3. monitoring Mar 11, 2026, 03:19 PM UTC

    The mitigation implemented by Engineering has improved the network issue. The team is continuing to monitor the environment to ensure full recovery.

  4. resolved Mar 11, 2026, 04:41 PM UTC

    The mitigation implemented by Engineering has resolved the Kubernetes network issue, and the incident is now contained.

Read the full incident report →

Minor February 18, 2026

Degraded Performance on the DataRobot MTS due to Quay outage

Detected by Pingoru
Feb 18, 2026, 09:04 PM UTC
Resolved
Feb 18, 2026, 09:19 PM UTC
Duration
15m
Affected: WebsiteWebsiteWebsiteAPIAPIAPIPredictionsPredictionsPredictionAutoMLAutoMLAutoMLAI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data IngestAI AppsAI AppsAI AppsMLOpsMLOpsMLOpsPipelineGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 2 updates
  1. investigating Feb 18, 2026, 09:04 PM UTC

    Our engineering team has found the the Quay outage currently happening is causing degraded performance across the DataRobot platform.

  2. resolved Feb 18, 2026, 09:19 PM UTC

    This incident is now resolved.

Read the full incident report →

Major February 17, 2026

LLM blueprints deployments cannot be created

Detected by Pingoru
Feb 17, 2026, 03:48 PM UTC
Resolved
Feb 17, 2026, 03:55 PM UTC
Duration
7m
Affected: Generative AI LLM Playground
Timeline · 2 updates
  1. identified Feb 17, 2026, 03:48 PM UTC

    LLM blueprints deployments can not be created in JP MTS environment. Engineering is rolling back JP cluster to previous version to mitigate the issue.

  2. resolved Feb 17, 2026, 03:55 PM UTC

    Rollback of the JP cluster to the previous version is complete and the problem has been mitigated.

Read the full incident report →

Minor February 16, 2026

Agent Application Template Impacted After Moderations Library Upgrade.

Detected by Pingoru
Feb 16, 2026, 11:35 AM UTC
Resolved
Feb 16, 2026, 12:49 PM UTC
Duration
1h 13m
Affected: AI AppsAI AppsAI Apps
Timeline · 2 updates
  1. identified Feb 16, 2026, 11:35 AM UTC

    Agent application template is affected with the recent moderations library upgrade, fix is identified and mitigation is in progress.

  2. resolved Feb 16, 2026, 12:49 PM UTC

    New version of Agentic application template is released, the issue is resolved

Read the full incident report →

Minor February 13, 2026

Degraded Performance on the DataRobot US MTS

Detected by Pingoru
Feb 13, 2026, 08:39 PM UTC
Resolved
Feb 13, 2026, 09:31 PM UTC
Duration
52m
Affected: APIAI Catalog and Data Ingest
Timeline · 2 updates
  1. investigating Feb 13, 2026, 08:39 PM UTC

    We are observing issues on DataRobot US MTS environment. Users may experience degraded performance using APIs and data ingest services. The engineering team is currently investigating the root cause.

  2. resolved Feb 13, 2026, 09:31 PM UTC

    The incident has now been resolved. All services are now operational.

Read the full incident report →

Critical January 8, 2026

Problem connecting to DataRobot due to client browser caching

Detected by Pingoru
Jan 08, 2026, 06:15 PM UTC
Resolved
Jan 22, 2026, 02:19 PM UTC
Duration
13d 20h
Affected: WebsiteWebsiteWebsite
Timeline · 3 updates
  1. monitoring Jan 08, 2026, 06:15 PM UTC

    Some customers have reported issue connecting to DataRobot. Please do a hard refresh of your browser by clearing the cache and this should fix the problem. As always let us know if it continue to have issue connecting to DataRobot after clearing the cache.

  2. monitoring Jan 08, 2026, 06:16 PM UTC

    We are continuing to monitor for any further issues.

  3. resolved Jan 22, 2026, 02:19 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor December 17, 2025

Degraded Platform Performance Due to Docker Hub Outage

Detected by Pingoru
Dec 17, 2025, 05:41 PM UTC
Resolved
Dec 17, 2025, 05:00 PM UTC
Duration
Timeline · 1 update
  1. resolved Dec 17, 2025, 05:41 PM UTC

    Between 16:50 UTC and 17:09 UTC, one of external providers(DockerHub) had an outage that might have caused temporary delays in starting platform workloads. The issue has been resolved and normal operations have resumed. Engineering is continuing to monitor the system.

Read the full incident report →

Minor December 9, 2025

DataRobot LLM Gateway OpenAI models. Planned Maintenance

Detected by Pingoru
Dec 09, 2025, 01:31 PM UTC
Resolved
Dec 29, 2025, 11:00 AM UTC
Duration
19d 21h
Timeline · 1 update
  1. monitoring Dec 29, 2025, 11:00 AM UTC

    Dec 29, 11:00 UTC Completed - The scheduled maintenance has been completed. Dec 29, 09:00 UTC In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 9, 13:31 UTC Scheduled - DataRobot is performing an infra maintenance on app.datarobot.com, app.eu.datarobot.com and app.jp.datarobot.com. During this time, some users might experience intermittent interruptions with DataRobot LLM Gateway OpenAI models. Please reach out to [email protected] if you have any questions.

Read the full incident report →

Major November 26, 2025

Some Users Facing Issues In Accessing Notebooks/Codespaces in US/EU/JP MTS.

Detected by Pingoru
Nov 26, 2025, 07:17 AM UTC
Resolved
Nov 27, 2025, 08:26 AM UTC
Duration
1d 1h
Affected: NotebooksNotebooks
Timeline · 7 updates
  1. investigating Nov 26, 2025, 07:17 AM UTC

    Some Users with GenAI licensing are facing issues in accessing Notebooks/Codespaces. Engineering team is checking into it.

  2. identified Nov 26, 2025, 10:30 AM UTC

    Engineering team has identified the issue and hotfix is being applied to all MTS envs

  3. identified Nov 26, 2025, 01:28 PM UTC

    We are continuing to work on a fix for this issue.

  4. identified Nov 26, 2025, 01:30 PM UTC

    Engineering team has verified that the fix has been applied to JP MTS production environment and Non GenAI users are now able to access notebooks in JP cluster. We are now applying the hotfix to US and EU clusters.

  5. identified Nov 26, 2025, 02:04 PM UTC

    Engineering team has verified that the fix has been applied to JP MTS production environment and Non GenAI users are now able to access notebooks in JP cluster. We are now applying the hotfix to US and EU clusters. The mitigation for Self Managed and STS deployments is in progress.

  6. identified Nov 26, 2025, 07:26 PM UTC

    Our team has deployed a fix for the issue. All the MTS environments are currently operational.

  7. resolved Nov 27, 2025, 08:26 AM UTC

    This incident has been resolved.

Read the full incident report →

Major November 20, 2025

Some Users Are Unable To Login Into US,EU & JP MTS clusters.

Detected by Pingoru
Nov 20, 2025, 06:52 AM UTC
Resolved
Nov 20, 2025, 07:23 AM UTC
Duration
30m
Affected: WebsiteWebsiteWebsite
Timeline · 2 updates
  1. investigating Nov 20, 2025, 06:52 AM UTC

    Engineering is investigating reports of some users not being able to login in US,EU & JP MTS clusters. Please contact [email protected] if you have any questions.

  2. resolved Nov 20, 2025, 07:23 AM UTC

    The issue is fixed. Access to the US, EU, and JP clusters is restored.

Read the full incident report →

Minor November 7, 2025

The AutoML experiment creation workflow and visibility of certain tabs impacted in MTS environments

Detected by Pingoru
Nov 07, 2025, 07:14 AM UTC
Resolved
Nov 07, 2025, 11:50 AM UTC
Duration
4h 35m
Affected: WebsiteWebsiteWebsiteAutoMLAutoMLAutoML
Timeline · 2 updates
  1. investigating Nov 07, 2025, 07:14 AM UTC

    The AutoML experiment creation workflow and the visibility of certain tabs in both the Classic and NextGen UIs are currently impacted for the customers with GenAI license. Engineering is actively investigating the issue.

  2. resolved Nov 07, 2025, 11:50 AM UTC

    The issue has been resolved.

Read the full incident report →

Minor November 6, 2025

Notebooks are timing out occasionally on US Production.

Detected by Pingoru
Nov 06, 2025, 12:38 PM UTC
Resolved
Nov 07, 2025, 12:37 PM UTC
Duration
23h 59m
Affected: Notebooks
Timeline · 3 updates
  1. investigating Nov 06, 2025, 12:38 PM UTC

    Notebooks are timing out occasionally on US Production, engineering team is investigating the issue.

  2. monitoring Nov 06, 2025, 01:02 PM UTC

    The issue of Notebooks timing out is not reproducible on US Prod. Engineering is monitoring the service and investigating the root cause.

  3. resolved Nov 07, 2025, 12:37 PM UTC

    The Engineering team hasn't observed any new occurrences during monitoring. All services remain fully operational.

Read the full incident report →

Minor November 4, 2025

Data Connectors Hanging on Creation

Detected by Pingoru
Nov 04, 2025, 01:57 PM UTC
Resolved
Nov 04, 2025, 02:32 PM UTC
Duration
35m
Affected: AI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data Ingest
Timeline · 2 updates
  1. investigating Nov 04, 2025, 01:57 PM UTC

    Users in MTS Production are not able to create new Data Connections, the process times out and Data Connection cannot be saved. The issue was reported for Snowflake connection, however other connections may be affected too. The Engineering team is actively investigating the issue.

  2. resolved Nov 04, 2025, 02:32 PM UTC

    The Engineering team has successfully resolved the connection and processing timeouts affecting data operations in MTS Production. Services are now running stable.

Read the full incident report →

Minor October 22, 2025

Temporary Access Issue for Org Admins to Custom Applications

Detected by Pingoru
Oct 22, 2025, 01:16 PM UTC
Resolved
Oct 22, 2025, 02:40 PM UTC
Duration
1h 24m
Affected: AI AppsAI AppsAI Apps
Timeline · 3 updates
  1. identified Oct 22, 2025, 01:16 PM UTC

    After the recent permission update, Org Admins temporarily lost access to Custom Applications. A fix is being deployed and should be completed within 6 hours.

  2. identified Oct 22, 2025, 02:40 PM UTC

    The hotfix has been applied and access for Org Admins to Custom Applications has been restored.

  3. resolved Oct 22, 2025, 02:40 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor October 20, 2025

AWS services outage affects STS and MTS DataRobot environments

Detected by Pingoru
Oct 20, 2025, 09:49 AM UTC
Resolved
Oct 20, 2025, 10:11 PM UTC
Duration
12h 22m
Affected: WebsiteWebsiteAPIAPIPredictionsAI Catalog and Data IngestAutoMLGenerative AI LLM PlaygroundAI Catalog and Data IngestGenerative AI VDB BuilderAI AppsMLOpsPipelineNotebooksGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 7 updates
  1. investigating Oct 20, 2025, 09:49 AM UTC

    Due to on-going AWS outage in us-east-1 region overall functionality of MTS and STS is affected. Users will experience degraded performance and intermittent connectivity issues. Engineering team is in contact with AWS team to find a resolution.

  2. investigating Oct 20, 2025, 10:01 AM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 20, 2025, 10:02 AM UTC

    We are continuing to investigate this issue.

  4. investigating Oct 20, 2025, 10:03 AM UTC

    We are continuing to investigate this issue.

  5. monitoring Oct 20, 2025, 10:03 AM UTC

    Overall functionality of DataRobot is recovering. Engineering team is actively monitoring the services to ensure continued stability.

  6. monitoring Oct 20, 2025, 01:57 PM UTC

    We are continuing to monitor for any further issues.

  7. resolved Oct 20, 2025, 10:11 PM UTC

    This incident has been resolved. AWS is reporting systems as operational. Engineering will continue to monitor the situation.

Read the full incident report →

Minor October 1, 2025

UI Issue Affecting Model Visibility and Project Management

Detected by Pingoru
Oct 01, 2025, 09:40 AM UTC
Resolved
Oct 01, 2025, 06:04 PM UTC
Duration
8h 24m
Affected: AutoMLAutoMLAutoML
Timeline · 4 updates
  1. identified Oct 01, 2025, 09:40 AM UTC

    We’ve identified an issue causing limited access to Projects in Classic UI for customers with GenAI Builder seat license assigned. The Engineering team is working on a fix.

  2. identified Oct 01, 2025, 01:14 PM UTC

    The Engineering team has released and validated the fix on JP AI Cloud. Deployment to EU and US AI Clouds is in progress.

  3. identified Oct 01, 2025, 02:58 PM UTC

    The Engineering team has released and validated the fix on EU AI Cloud. Deployment to US AI Cloud is in progress.

  4. resolved Oct 01, 2025, 06:04 PM UTC

    Engineering team has applied the fix to address UI issue affecting Model Visibility and Project Management. This problem has now been resolved.

Read the full incident report →

Minor September 25, 2025

Issue With DockerHub Services Impacting MTS & STS DataRobot Clusters.

Detected by Pingoru
Sep 25, 2025, 01:19 AM UTC
Resolved
Sep 25, 2025, 01:56 AM UTC
Duration
37m
Affected: APIAPIAPIAPIPredictionsPredictionsPredictionAI AppsAI AppsAI AppsNotebooksNotebooks
Timeline · 3 updates
  1. identified Sep 25, 2025, 01:19 AM UTC

    At 23:09 UTC / 4:09 PM PST, an unexpected outage from DockerHub triggered internal alerts. The engineering team is currently assessing potential impact across environments. We will share further details and mitigation steps as soon as our investigation progresses.

  2. monitoring Sep 25, 2025, 01:35 AM UTC

    As of 01:09 UTC (Sept 25)/ 06:09 PM PST, systems have started recovering from the DockerHub outage. DataRobot internal services are back to normal. The engineering team will continue to monitor closely.

  3. resolved Sep 25, 2025, 01:56 AM UTC

    Issue is resolved and DataRobot MTS & STS services are back to normal.

Read the full incident report →