Scale AI incident

ML Pod Startup Failures

Notice Resolved View vendor source →

Scale AI experienced a notice incident on March 30, 2023, lasting —. The incident has been resolved; the full update timeline is below.

Started
Mar 30, 2023, 02:30 PM UTC
Resolved
Mar 30, 2023, 02:30 PM UTC
Duration
Detected by Pingoru
Mar 30, 2023, 02:30 PM UTC

Update timeline

  1. resolved Mar 30, 2023, 05:13 PM UTC

    There was an outage with a service that manages the metadata for models from 7:06am PT to 7:50am PT. It prevented model services from starting up new pods. ML services that received security updates, deployed a new version, or attempted to scale up were impacted. There was an additional 8 to 12 minutes of service degradation after restoring the impacted ML services to catch up on missed requests.