Replicate Outage History

Replicate is up right now

Replicate had 30 outages in the last 2 years totaling 258h 24m of downtime — averaging 1.2 incidents per month.

There were 30 Replicate outages since August 24, 2025 totaling 258h 24m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://www.replicatestatus.com

Minor June 3, 2026

We're seeing long setup times and high contention for models on some L40S and H200 clusters.

Detected by Pingoru
Jun 03, 2026, 07:06 PM UTC
Resolved
Jun 03, 2026, 07:06 PM UTC
Duration
Affected: L40S HardwareH100 Hardware
Timeline · 3 updates
  1. investigating Jun 03, 2026, 05:59 PM UTC

    Status: Investigating We're seeing long setup times and high contention for models on some L40S and H200 clusters. Affected components H100 Hardware (Partial outage) L40S Hardware (Partial outage)

  2. investigating Jun 03, 2026, 07:06 PM UTC

    Status: Investigating System is back to operating normally Affected components H100 Hardware (Partial outage) L40S Hardware (Partial outage)

  3. resolved Jun 03, 2026, 07:06 PM UTC

    Status: Resolved System is back to operating normally Affected components H100 Hardware (Operational) L40S Hardware (Operational)

Read the full incident report →

Minor May 28, 2026

Degraded performance on flux-2-klein-4b

Detected by Pingoru
May 28, 2026, 02:17 PM UTC
Resolved
May 28, 2026, 02:17 PM UTC
Duration
Affected: Official Models
Timeline · 2 updates
  1. investigating May 28, 2026, 12:30 PM UTC

    Status: Investigating Long queue times for black-forest-labs/flux-2-klein-4b resulting in canceled predictions Affected components Official Models (Degraded performance)

  2. resolved May 28, 2026, 02:17 PM UTC

    Status: Resolved This issue has been resolved and queue times are back to normal Affected components Official Models (Operational)

Read the full incident report →

Minor May 21, 2026

Prediction and Training status updates delayed

Detected by Pingoru
May 21, 2026, 11:28 PM UTC
Resolved
May 21, 2026, 11:28 PM UTC
Duration
Affected: Streaming APIHTTP APICPU HardwareA100 HardwarePlaygroundHome PageL40S HardwareH100 HardwareT4 Hardware
Timeline · 2 updates
  1. identified May 21, 2026, 09:38 PM UTC

    Status: Identified Our message queues for prediction and training status updates are hitting capacity limits which are causing connection failures for queue consumers. We are in the process of bringing additional capacity online. Affected components Playground (Degraded performance) H100 Hardware (Degraded performance) HTTP API (Degraded performance) T4 Hardware (Degraded performance) CPU Hardware (Degraded performance) Home Page (Degraded performance) L40S Hardware (Degraded performance) A100 Hardware (Degraded performance) Streaming API (Degraded performance)

  2. resolved May 21, 2026, 11:28 PM UTC

    Status: Resolved Message flows are healthy. Affected components L40S Hardware (Operational) A100 Hardware (Operational) Streaming API (Operational) HTTP API (Operational) T4 Hardware (Operational) CPU Hardware (Operational) Home Page (Operational) Playground (Operational) H100 Hardware (Operational)

Read the full incident report →

Minor May 21, 2026

Constrained H100 capacity

Detected by Pingoru
May 21, 2026, 10:05 PM UTC
Resolved
May 21, 2026, 10:05 PM UTC
Duration
Affected: H100 Hardware
Timeline · 2 updates
  1. identified May 21, 2026, 03:09 PM UTC

    Status: Identified We are seeing heightened demand for H100 hardware which is causing severe queue delays. Affected components H100 Hardware (Partial outage)

  2. resolved May 21, 2026, 10:05 PM UTC

    Status: Resolved H100 hardware contention has resolved. Thank you for your patience! Affected components H100 Hardware (Operational)

Read the full incident report →

Minor May 12, 2026

Constrained capacity for H100 hardware

Detected by Pingoru
May 12, 2026, 07:43 PM UTC
Resolved
May 12, 2026, 07:43 PM UTC
Duration
Affected: H100 Hardware
Timeline · 2 updates
  1. identified May 12, 2026, 03:27 PM UTC

    Status: Identified Demand for constrained H100 hardware is causing scaling delays. This impacts queue size and inference speed for any models running on H100s. Affected components H100 Hardware (Degraded performance)

  2. resolved May 12, 2026, 07:43 PM UTC

    Status: Resolved There is no more contention for H100 hardware. Thank you for your patience! Affected components H100 Hardware (Operational)

Read the full incident report →

Minor April 19, 2026

Degraded A100 hardware

Detected by Pingoru
Apr 19, 2026, 03:02 PM UTC
Resolved
Apr 19, 2026, 03:02 PM UTC
Duration
Affected: A100 Hardware
Timeline · 2 updates
  1. monitoring Apr 19, 2026, 02:36 PM UTC

    Status: Monitoring All predictions and trainings targeting A100 hardware are experiencing degraded performance while control plane nodes restart. Affected components A100 Hardware (Degraded performance)

  2. resolved Apr 19, 2026, 03:02 PM UTC

    Status: Resolved All A100 capacity is back. Thanks for your patience! Affected components A100 Hardware (Operational)

Read the full incident report →

Minor April 9, 2026

A100 capacity unavailable during storage maintenance

Detected by Pingoru
Apr 09, 2026, 05:30 PM UTC
Resolved
Apr 09, 2026, 05:30 PM UTC
Duration
Affected: A100 Hardware
Timeline · 2 updates
  1. investigating Apr 09, 2026, 04:43 PM UTC

    Status: Investigating The persistent storage for all A100 hardware is under maintenance and is expected to be degraded until completion. Affected components A100 Hardware (Partial outage)

  2. resolved Apr 09, 2026, 05:30 PM UTC

    Status: Resolved The maintenance is complete and all systems are reporting healthy. Thank you for your patience! Affected components A100 Hardware (Operational)

Read the full incident report →

Minor March 24, 2026

Downstream errors for Black Forest Labs models

Detected by Pingoru
Mar 24, 2026, 01:34 AM UTC
Resolved
Mar 24, 2026, 01:34 AM UTC
Duration
Affected: Official Models
Timeline · 2 updates
  1. identified Mar 23, 2026, 03:57 PM UTC

    Status: Identified Some Black Forest Labs models are failing due to downstream errors from BFL. We are monitoring the situation and working on work arounds. BFL Status page: https://status.bfl.ml/ Affected components Official Models (Degraded performance)

  2. resolved Mar 24, 2026, 01:34 AM UTC

    Status: Resolved Black Forest Labs has resolved the issue. Affected components Official Models (Operational)

Read the full incident report →

Minor March 10, 2026

Degraded performance on Flux Schnell

Detected by Pingoru
Mar 10, 2026, 06:17 PM UTC
Resolved
Mar 10, 2026, 06:17 PM UTC
Duration
Timeline · 3 updates
  1. investigating Mar 10, 2026, 12:56 PM UTC

    Status: Investigating We are investigating an outage that is only affecting Flux Schnell.

  2. monitoring Mar 10, 2026, 01:11 PM UTC

    Status: Monitoring A GPU provider has an outage. Traffic is being rerouted and we are processing new Flux Schnell requests.

  3. resolved Mar 10, 2026, 06:17 PM UTC

    Status: Resolved Flux Schnell requests are being served normally.

Read the full incident report →

Minor February 20, 2026

Model Predictions Stuck at "Starting"

Detected by Pingoru
Feb 20, 2026, 01:12 PM UTC
Resolved
Feb 20, 2026, 01:12 PM UTC
Duration
Affected: Streaming APIHTTP APIOfficial Models
Timeline · 3 updates
  1. investigating Feb 20, 2026, 12:13 PM UTC

    Status: Investigating We are currently investigating why a large number of models are not currently processing requests, with predictions stalled with a "starting" status. Affected components Streaming API (Partial outage) HTTP API (Partial outage) Official Models (Partial outage)

  2. monitoring Feb 20, 2026, 12:59 PM UTC

    Status: Monitoring We have identified the root cause and have made an update. We are continuing to monitor as we start to see things improve. Affected components Streaming API (Degraded performance) HTTP API (Degraded performance) Official Models (Degraded performance)

  3. resolved Feb 20, 2026, 01:12 PM UTC

    Status: Resolved Models are once again operational Affected components Streaming API (Operational) HTTP API (Operational) Official Models (Operational)

Read the full incident report →