DataRobot incident

Issues with creating Custom Models from LLM Playground

Minor Resolved View vendor source →

DataRobot experienced a minor incident on January 28, 2025 affecting Generative AI LLM Playground, lasting 3h 21m. The incident has been resolved; the full update timeline is below.

Started
Jan 28, 2025, 02:51 PM UTC
Resolved
Jan 28, 2025, 06:13 PM UTC
Duration
3h 21m
Detected by Pingoru
Jan 28, 2025, 02:51 PM UTC

Affected components

Generative AI LLM Playground

Update timeline

  1. investigating Jan 28, 2025, 02:51 PM UTC

    Japan MTS cluster is experiencing issues with creating custom models from LLM Playground. The engineering team is investigating.

  2. monitoring Jan 28, 2025, 03:08 PM UTC

    The issue is mitigated and users are able to create custom models again. The engineering team will continue to monitor the environment and prepare a permanent fix until the incident is contained. The estimate is ~ 2 hrs at the moment.

  3. resolved Jan 28, 2025, 06:13 PM UTC

    This incident has been resolved.