- Detected by Pingoru
- May 28, 2026, 08:30 AM UTC
- Resolved
- May 28, 2026, 09:47 AM UTC
- Duration
- 1h 17m
Affected: MLOps
Timeline · 2 updates
-
identified May 28, 2026, 09:09 AM UTC
Feature drift statistics are currently experiencing a processing delay of approximately 1 hour. No data has been lost, and all metrics will reflect accurate values once processing catches up. Our team is actively working to resolve this.
-
resolved May 28, 2026, 09:47 AM UTC
Feature drift statistics processing has been restored to normal. All metrics are now up to date. No data was lost.
Read the full incident report →
- Detected by Pingoru
- May 26, 2026, 05:09 AM UTC
- Resolved
- May 26, 2026, 06:18 AM UTC
- Duration
- 1h 9m
Affected: APIAutoMLAI Catalog and Data Ingest
Timeline · 4 updates
-
investigating May 26, 2026, 05:09 AM UTC
We are experiencing a service interruption with Custom Models functionality in US SAAS environment. Predictions to existing deployments are working fine, but users cannot create new custom models. Engineering is investigating the issue and will provide updates as we make further progress.
-
monitoring May 26, 2026, 06:01 AM UTC
Engineering has applied a fix in the US SAAS environment which resolved the issue. At the time of issue, some users might have experienced issues with Custom Apps, Data upload and custom model creation. The issue is contained.
-
monitoring May 26, 2026, 06:04 AM UTC
We are continuing to monitor for any further issues.
-
resolved May 26, 2026, 06:18 AM UTC
Custom Models, Custom Applications & Data upload services are back to Operational state in US SAAS Environment. Issue is Resolved.
Read the full incident report →
- Detected by Pingoru
- May 08, 2026, 12:15 AM UTC
- Resolved
- May 08, 2026, 08:21 PM UTC
- Duration
- 20h 5m
Affected: AI AppsMLOpsNotebooks
Timeline · 5 updates
-
monitoring May 08, 2026, 03:53 AM UTC
We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Notebooks, Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.
-
identified May 08, 2026, 05:47 AM UTC
We are currently experiencing intermittent service issues in US Production, which are primarily affecting the launch of new workloads for Custom models, and Custom Applications. This issue does not impact existing workloads. This disruption is strongly correlated with an ongoing AWS Availability Zone outage (https://health.aws.amazon.com/health/status), causing resource allocation failures. The team is actively monitoring the situation and tracking updates from AWS.
-
identified May 08, 2026, 06:13 AM UTC
We are continuing to experience issues launching new workloads for Custom Models and Custom Applications in US Production. This is connected to an ongoing AWS outage. Our team is exploring multiple mitigation options.
-
monitoring May 08, 2026, 01:40 PM UTC
Engineering resolved the underlying issue with workload scheduling and is monitoring the cluster.
-
resolved May 08, 2026, 08:21 PM UTC
Engineering confirmed the issue is resolved and all services are restored.
Read the full incident report →
- Detected by Pingoru
- Apr 10, 2026, 10:24 AM UTC
- Resolved
- Apr 10, 2026, 11:01 AM UTC
- Duration
- 36m
Affected: MLOps
Timeline · 2 updates
-
monitoring Apr 10, 2026, 10:24 AM UTC
Processing actual messages on JP MTS is delayed due to autoscaling malfunction. Engineering scaled up the deployment to alleviate the issue. Root cause mitigation in progress
-
resolved Apr 10, 2026, 11:01 AM UTC
Engineering has applied the required infrastructure configuration changes. The service is operating normally and no further user impact is observed. Engineering will continue monitoring cluster health to ensure stability. The incident is now marked as Contained.
Read the full incident report →
- Detected by Pingoru
- Apr 09, 2026, 09:56 PM UTC
- Resolved
- Apr 10, 2026, 01:04 PM UTC
- Duration
- 15h 7m
Affected: AI AppsNotebooks
Timeline · 4 updates
-
investigating Apr 09, 2026, 09:56 PM UTC
We're experiencing an elevated level of errors and are currently looking into the issue.
-
monitoring Apr 09, 2026, 10:56 PM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Apr 10, 2026, 11:31 AM UTC
Engineering has applied changes to mitigate the elevated error rates. Services are now operating normally. We are continuing to monitor the system while investigating the cause of the issue.
-
resolved Apr 10, 2026, 01:04 PM UTC
Engineering has implemented the required fixes to resolve the elevated error rates. Services are now operating normally, and no further user impact has been observed. The team will continue to monitor the system to ensure stability. The incident is now considered contained.
Read the full incident report →
- Detected by Pingoru
- Mar 30, 2026, 08:43 PM UTC
- Resolved
- Mar 31, 2026, 08:53 AM UTC
- Duration
- 12h 10m
Affected: WebsiteWebsiteWebsiteAPIAPIAPIPredictionsPredictionsPredictionAutoMLAutoMLAutoMLAI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data IngestAI AppsAI AppsAI AppsMLOpsMLOpsMLOpsPipelineGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 3 updates
-
identified Mar 30, 2026, 08:43 PM UTC
Our engineering team has found the the Quay outage currently happening is causing degraded performance across the DataRobot platform. Engineering is currently monitoring the situation.
-
identified Mar 30, 2026, 08:44 PM UTC
We are continuing to work on a fix for this issue.
-
resolved Mar 31, 2026, 08:53 AM UTC
Quay.io functionality has been restored and DataRobot environments are fully stabilized.
Read the full incident report →
- Detected by Pingoru
- Mar 13, 2026, 05:49 PM UTC
- Resolved
- Mar 13, 2026, 06:43 PM UTC
- Duration
- 54m
Affected: API
Timeline · 3 updates
-
investigating Mar 13, 2026, 05:49 PM UTC
We are experiencing performance degradation on Managed AI Cloud.
-
monitoring Mar 13, 2026, 06:31 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 13, 2026, 06:43 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 11, 2026, 08:05 PM UTC
- Resolved
- Mar 17, 2026, 06:32 PM UTC
- Duration
- 5d 22h
Affected: WebsiteAPI
Timeline · 4 updates
-
investigating Mar 11, 2026, 08:05 PM UTC
We are currently investigating this issue.
-
monitoring Mar 11, 2026, 08:22 PM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Mar 17, 2026, 06:31 PM UTC
We are continuing to monitor for any further issues.
-
resolved Mar 17, 2026, 06:32 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 11, 2026, 02:36 PM UTC
- Resolved
- Mar 11, 2026, 04:41 PM UTC
- Duration
- 2h 5m
Affected: PredictionsMLOpsPipeline
Timeline · 4 updates
-
investigating Mar 11, 2026, 02:36 PM UTC
DataRobot is experiencing network issue related to Kubernetes in US Cluster. This will have impact on model deployment and predictions. Engineering is investigating the root cause.
-
identified Mar 11, 2026, 03:06 PM UTC
Engineering has identified the root cause of the problem and a mitigation is put in place.
-
monitoring Mar 11, 2026, 03:19 PM UTC
The mitigation implemented by Engineering has improved the network issue. The team is continuing to monitor the environment to ensure full recovery.
-
resolved Mar 11, 2026, 04:41 PM UTC
The mitigation implemented by Engineering has resolved the Kubernetes network issue, and the incident is now contained.
Read the full incident report →
- Detected by Pingoru
- Feb 18, 2026, 09:04 PM UTC
- Resolved
- Feb 18, 2026, 09:19 PM UTC
- Duration
- 15m
Affected: WebsiteWebsiteWebsiteAPIAPIAPIPredictionsPredictionsPredictionAutoMLAutoMLAutoMLAI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data IngestAI AppsAI AppsAI AppsMLOpsMLOpsMLOpsPipelineGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundNotebooksGenerative AI VDB BuilderGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 2 updates
-
investigating Feb 18, 2026, 09:04 PM UTC
Our engineering team has found the the Quay outage currently happening is causing degraded performance across the DataRobot platform.
-
resolved Feb 18, 2026, 09:19 PM UTC
This incident is now resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 17, 2026, 03:48 PM UTC
- Resolved
- Feb 17, 2026, 03:55 PM UTC
- Duration
- 7m
Affected: Generative AI LLM Playground
Timeline · 2 updates
-
identified Feb 17, 2026, 03:48 PM UTC
LLM blueprints deployments can not be created in JP MTS environment. Engineering is rolling back JP cluster to previous version to mitigate the issue.
-
resolved Feb 17, 2026, 03:55 PM UTC
Rollback of the JP cluster to the previous version is complete and the problem has been mitigated.
Read the full incident report →
- Detected by Pingoru
- Feb 16, 2026, 11:35 AM UTC
- Resolved
- Feb 16, 2026, 12:49 PM UTC
- Duration
- 1h 13m
Affected: AI AppsAI AppsAI Apps
Timeline · 2 updates
-
identified Feb 16, 2026, 11:35 AM UTC
Agent application template is affected with the recent moderations library upgrade, fix is identified and mitigation is in progress.
-
resolved Feb 16, 2026, 12:49 PM UTC
New version of Agentic application template is released, the issue is resolved
Read the full incident report →
- Detected by Pingoru
- Feb 13, 2026, 08:39 PM UTC
- Resolved
- Feb 13, 2026, 09:31 PM UTC
- Duration
- 52m
Affected: APIAI Catalog and Data Ingest
Timeline · 2 updates
-
investigating Feb 13, 2026, 08:39 PM UTC
We are observing issues on DataRobot US MTS environment. Users may experience degraded performance using APIs and data ingest services. The engineering team is currently investigating the root cause.
-
resolved Feb 13, 2026, 09:31 PM UTC
The incident has now been resolved. All services are now operational.
Read the full incident report →
- Detected by Pingoru
- Jan 08, 2026, 06:15 PM UTC
- Resolved
- Jan 22, 2026, 02:19 PM UTC
- Duration
- 13d 20h
Affected: WebsiteWebsiteWebsite
Timeline · 3 updates
-
monitoring Jan 08, 2026, 06:15 PM UTC
Some customers have reported issue connecting to DataRobot. Please do a hard refresh of your browser by clearing the cache and this should fix the problem. As always let us know if it continue to have issue connecting to DataRobot after clearing the cache.
-
monitoring Jan 08, 2026, 06:16 PM UTC
We are continuing to monitor for any further issues.
-
resolved Jan 22, 2026, 02:19 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Dec 17, 2025, 05:41 PM UTC
- Resolved
- Dec 17, 2025, 05:00 PM UTC
- Duration
- —
Timeline · 1 update
-
resolved Dec 17, 2025, 05:41 PM UTC
Between 16:50 UTC and 17:09 UTC, one of external providers(DockerHub) had an outage that might have caused temporary delays in starting platform workloads. The issue has been resolved and normal operations have resumed. Engineering is continuing to monitor the system.
Read the full incident report →
- Detected by Pingoru
- Dec 09, 2025, 01:31 PM UTC
- Resolved
- Dec 29, 2025, 11:00 AM UTC
- Duration
- 19d 21h
Timeline · 1 update
-
monitoring Dec 29, 2025, 11:00 AM UTC
Dec 29, 11:00 UTC Completed - The scheduled maintenance has been completed. Dec 29, 09:00 UTC In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 9, 13:31 UTC Scheduled - DataRobot is performing an infra maintenance on app.datarobot.com, app.eu.datarobot.com and app.jp.datarobot.com. During this time, some users might experience intermittent interruptions with DataRobot LLM Gateway OpenAI models. Please reach out to [email protected] if you have any questions.
Read the full incident report →
- Detected by Pingoru
- Nov 26, 2025, 07:17 AM UTC
- Resolved
- Nov 27, 2025, 08:26 AM UTC
- Duration
- 1d 1h
Affected: NotebooksNotebooks
Timeline · 7 updates
-
investigating Nov 26, 2025, 07:17 AM UTC
Some Users with GenAI licensing are facing issues in accessing Notebooks/Codespaces. Engineering team is checking into it.
-
identified Nov 26, 2025, 10:30 AM UTC
Engineering team has identified the issue and hotfix is being applied to all MTS envs
-
identified Nov 26, 2025, 01:28 PM UTC
We are continuing to work on a fix for this issue.
-
identified Nov 26, 2025, 01:30 PM UTC
Engineering team has verified that the fix has been applied to JP MTS production environment and Non GenAI users are now able to access notebooks in JP cluster. We are now applying the hotfix to US and EU clusters.
-
identified Nov 26, 2025, 02:04 PM UTC
Engineering team has verified that the fix has been applied to JP MTS production environment and Non GenAI users are now able to access notebooks in JP cluster. We are now applying the hotfix to US and EU clusters. The mitigation for Self Managed and STS deployments is in progress.
-
identified Nov 26, 2025, 07:26 PM UTC
Our team has deployed a fix for the issue. All the MTS environments are currently operational.
-
resolved Nov 27, 2025, 08:26 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Nov 20, 2025, 06:52 AM UTC
- Resolved
- Nov 20, 2025, 07:23 AM UTC
- Duration
- 30m
Affected: WebsiteWebsiteWebsite
Timeline · 2 updates
-
investigating Nov 20, 2025, 06:52 AM UTC
Engineering is investigating reports of some users not being able to login in US,EU & JP MTS clusters. Please contact [email protected] if you have any questions.
-
resolved Nov 20, 2025, 07:23 AM UTC
The issue is fixed. Access to the US, EU, and JP clusters is restored.
Read the full incident report →
- Detected by Pingoru
- Nov 07, 2025, 07:14 AM UTC
- Resolved
- Nov 07, 2025, 11:50 AM UTC
- Duration
- 4h 35m
Affected: WebsiteWebsiteWebsiteAutoMLAutoMLAutoML
Timeline · 2 updates
-
investigating Nov 07, 2025, 07:14 AM UTC
The AutoML experiment creation workflow and the visibility of certain tabs in both the Classic and NextGen UIs are currently impacted for the customers with GenAI license. Engineering is actively investigating the issue.
-
resolved Nov 07, 2025, 11:50 AM UTC
The issue has been resolved.
Read the full incident report →
- Detected by Pingoru
- Nov 06, 2025, 12:38 PM UTC
- Resolved
- Nov 07, 2025, 12:37 PM UTC
- Duration
- 23h 59m
Affected: Notebooks
Timeline · 3 updates
-
investigating Nov 06, 2025, 12:38 PM UTC
Notebooks are timing out occasionally on US Production, engineering team is investigating the issue.
-
monitoring Nov 06, 2025, 01:02 PM UTC
The issue of Notebooks timing out is not reproducible on US Prod. Engineering is monitoring the service and investigating the root cause.
-
resolved Nov 07, 2025, 12:37 PM UTC
The Engineering team hasn't observed any new occurrences during monitoring. All services remain fully operational.
Read the full incident report →
- Detected by Pingoru
- Nov 04, 2025, 01:57 PM UTC
- Resolved
- Nov 04, 2025, 02:32 PM UTC
- Duration
- 35m
Affected: AI Catalog and Data IngestAI Catalog and Data IngestAI Catalog and Data Ingest
Timeline · 2 updates
-
investigating Nov 04, 2025, 01:57 PM UTC
Users in MTS Production are not able to create new Data Connections, the process times out and Data Connection cannot be saved. The issue was reported for Snowflake connection, however other connections may be affected too. The Engineering team is actively investigating the issue.
-
resolved Nov 04, 2025, 02:32 PM UTC
The Engineering team has successfully resolved the connection and processing timeouts affecting data operations in MTS Production. Services are now running stable.
Read the full incident report →
- Detected by Pingoru
- Oct 22, 2025, 01:16 PM UTC
- Resolved
- Oct 22, 2025, 02:40 PM UTC
- Duration
- 1h 24m
Affected: AI AppsAI AppsAI Apps
Timeline · 3 updates
-
identified Oct 22, 2025, 01:16 PM UTC
After the recent permission update, Org Admins temporarily lost access to Custom Applications. A fix is being deployed and should be completed within 6 hours.
-
identified Oct 22, 2025, 02:40 PM UTC
The hotfix has been applied and access for Org Admins to Custom Applications has been restored.
-
resolved Oct 22, 2025, 02:40 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Oct 20, 2025, 09:49 AM UTC
- Resolved
- Oct 20, 2025, 10:11 PM UTC
- Duration
- 12h 22m
Affected: WebsiteWebsiteAPIAPIPredictionsAI Catalog and Data IngestAutoMLGenerative AI LLM PlaygroundAI Catalog and Data IngestGenerative AI VDB BuilderAI AppsMLOpsPipelineNotebooksGenerative AI LLM PlaygroundGenerative AI VDB Builder
Timeline · 7 updates
-
investigating Oct 20, 2025, 09:49 AM UTC
Due to on-going AWS outage in us-east-1 region overall functionality of MTS and STS is affected. Users will experience degraded performance and intermittent connectivity issues. Engineering team is in contact with AWS team to find a resolution.
-
investigating Oct 20, 2025, 10:01 AM UTC
We are continuing to investigate this issue.
-
investigating Oct 20, 2025, 10:02 AM UTC
We are continuing to investigate this issue.
-
investigating Oct 20, 2025, 10:03 AM UTC
We are continuing to investigate this issue.
-
monitoring Oct 20, 2025, 10:03 AM UTC
Overall functionality of DataRobot is recovering. Engineering team is actively monitoring the services to ensure continued stability.
-
monitoring Oct 20, 2025, 01:57 PM UTC
We are continuing to monitor for any further issues.
-
resolved Oct 20, 2025, 10:11 PM UTC
This incident has been resolved. AWS is reporting systems as operational. Engineering will continue to monitor the situation.
Read the full incident report →
- Detected by Pingoru
- Oct 01, 2025, 09:40 AM UTC
- Resolved
- Oct 01, 2025, 06:04 PM UTC
- Duration
- 8h 24m
Affected: AutoMLAutoMLAutoML
Timeline · 4 updates
-
identified Oct 01, 2025, 09:40 AM UTC
We’ve identified an issue causing limited access to Projects in Classic UI for customers with GenAI Builder seat license assigned. The Engineering team is working on a fix.
-
identified Oct 01, 2025, 01:14 PM UTC
The Engineering team has released and validated the fix on JP AI Cloud. Deployment to EU and US AI Clouds is in progress.
-
identified Oct 01, 2025, 02:58 PM UTC
The Engineering team has released and validated the fix on EU AI Cloud. Deployment to US AI Cloud is in progress.
-
resolved Oct 01, 2025, 06:04 PM UTC
Engineering team has applied the fix to address UI issue affecting Model Visibility and Project Management. This problem has now been resolved.
Read the full incident report →
- Detected by Pingoru
- Sep 25, 2025, 01:19 AM UTC
- Resolved
- Sep 25, 2025, 01:56 AM UTC
- Duration
- 37m
Affected: APIAPIAPIAPIPredictionsPredictionsPredictionAI AppsAI AppsAI AppsNotebooksNotebooks
Timeline · 3 updates
-
identified Sep 25, 2025, 01:19 AM UTC
At 23:09 UTC / 4:09 PM PST, an unexpected outage from DockerHub triggered internal alerts. The engineering team is currently assessing potential impact across environments. We will share further details and mitigation steps as soon as our investigation progresses.
-
monitoring Sep 25, 2025, 01:35 AM UTC
As of 01:09 UTC (Sept 25)/ 06:09 PM PST, systems have started recovering from the DockerHub outage. DataRobot internal services are back to normal. The engineering team will continue to monitor closely.
-
resolved Sep 25, 2025, 01:56 AM UTC
Issue is resolved and DataRobot MTS & STS services are back to normal.
Read the full incident report →