Fly.io Outage History

There were 52 Fly.io outages since February 5, 2026 totaling 49h 15m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.fly.io

Notice March 5, 2026

Routing issues in NA regions

Detected by Pingoru: Mar 05, 2026, 07:24 PM UTC
Resolved: Mar 05, 2026, 07:50 PM UTC
Duration: 26m

Timeline · 3 updates

investigating Mar 05, 2026, 07:24 PM UTC

We're aware of routing issues affecting some customers in North America regions, and we're actively investigating.
monitoring Mar 05, 2026, 07:38 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 05, 2026, 07:50 PM UTC

This incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during this period.

Read the full incident report →

Major March 3, 2026

Elevated GraphQL API errors

Detected by Pingoru: Mar 03, 2026, 08:18 PM UTC
Resolved: Mar 03, 2026, 09:15 PM UTC
Duration: 57m

Affected: DashboardDeployments

Timeline · 3 updates

investigating Mar 03, 2026, 08:18 PM UTC

We're investigating elevated GraphQL errors that affect some API endpoints.
monitoring Mar 03, 2026, 08:36 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 03, 2026, 09:15 PM UTC

This incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service. We are still investigating the root cause of the failure. In the mean time, all API endpoints now appear to be stable and errors have dropped to baseline level.

Read the full incident report →

Minor March 3, 2026

Cost Explorer fails to load

Detected by Pingoru: Mar 03, 2026, 10:50 AM UTC
Resolved: Mar 03, 2026, 12:10 PM UTC
Duration: 1h 20m

Affected: Dashboard

Timeline · 2 updates

investigating Mar 03, 2026, 10:50 AM UTC

We are currently investigating this issue. The page currently displays: "We’re having trouble loading the cost breakdown."
resolved Mar 03, 2026, 12:10 PM UTC

This incident has been resolved.

Read the full incident report →

Notice March 3, 2026

Certificates issues affecting API and proxy

Detected by Pingoru: Mar 03, 2026, 02:05 AM UTC
Resolved: Mar 03, 2026, 12:54 AM UTC
Duration: —

Timeline · 1 update

resolved Mar 03, 2026, 02:05 AM UTC

Between 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignments. As the failure mode was Vault requests hanging rather than failing immediately, TLS requests through fly-proxy for domains where the certificate was not cached on the local node remained open for a long time while proxy attempted to fetch the certificate; this caused some connections to fail as too many connection slots were taken up by requests waiting on Vault. The root cause of this incident was a partially completed update to the Vault cluster. We will be implementing safeguards in the proxy for this failure mode, as well as improving certificate storage longer-term.

Read the full incident report →

Minor March 2, 2026

Issues with the Machines API

Detected by Pingoru: Mar 02, 2026, 09:19 PM UTC
Resolved: Mar 02, 2026, 09:50 PM UTC
Duration: 30m

Affected: DashboardMachines APIDeployments

Timeline · 4 updates

investigating Mar 02, 2026, 09:19 PM UTC

We're currently investigating issues with the Machines API. Customer deployments and the Fly dashboard may be affected.
identified Mar 02, 2026, 09:39 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Mar 02, 2026, 09:47 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 02, 2026, 09:50 PM UTC

This incident has been resolved.

Read the full incident report →

Major March 2, 2026

Machines failing to boot in EWR

Detected by Pingoru: Mar 02, 2026, 05:42 PM UTC
Resolved: Mar 02, 2026, 10:49 PM UTC
Duration: 5h 7m

Affected: EWR - Secaucus, NJ (US)

Timeline · 4 updates

investigating Mar 02, 2026, 05:42 PM UTC

We are currently investigating this issue.
identified Mar 02, 2026, 06:21 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Mar 02, 2026, 08:35 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Mar 02, 2026, 10:49 PM UTC

This incident has been resolved.

Read the full incident report →

Major February 27, 2026

Slow API requests

Detected by Pingoru: Feb 27, 2026, 06:50 PM UTC
Resolved: Feb 27, 2026, 08:21 PM UTC
Duration: 1h 31m

Affected: DashboardMachines APIDeploymentsRemote BuildsSprites

Timeline · 9 updates

investigating Feb 27, 2026, 06:50 PM UTC

We are investigating increased in API request latency and timeouts with the main platform API. This is impacting multiple operations, including creating, querying or performing actions against machines, as well as platform level operations like adding payment methods.
identified Feb 27, 2026, 06:52 PM UTC

We have identified the cause of the increased latency and are working on a fix. The most common errors we are seeing is timeouts when users attempt to perform an action against a newly created app / machine resource. Those may timeout or fail with an `app|machine not found` error
identified Feb 27, 2026, 06:53 PM UTC

We are continuing to work on a fix for this issue.
identified Feb 27, 2026, 06:59 PM UTC

New Sprite creations are also timing out or failing at this time. We are continuing to work on a fix for this issue.
identified Feb 27, 2026, 07:05 PM UTC

We are currently seeing full API failures for requests to our Graphql API and elevated failures for the machines API. Direct calls to these apis may fail, along with many flyctl commands. We have identified the cause of the issue and are continuing to work on a fix. Existing running machines and apps should continue to be reachable, but creates, deploys, or other features relying on platform API calls will fail at this time.
identified Feb 27, 2026, 07:23 PM UTC

An initial fix has been deployed and we are seeing improvements in load and API performance. Some operations that rely on the Graphql API, such as new app creations and some deployments, will continue to fail at this time. We are continuing to work on restoring full availability.
identified Feb 27, 2026, 07:41 PM UTC

A second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to existing apps should also work. We are working through a backlog of background jobs. New app / organization creations and other other operations that use these will continue to see increased latency or failures while we work thorough these. New MPG cluster and new Sprite creation continues to be impacted.
monitoring Feb 27, 2026, 08:05 PM UTC

API and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery. Background jobs are almost fully caught up. Users may still see slightly slower requests creating new apps / orgs, but they should complete successfully. Sprite and MPG cluster creations are processing as normal.
resolved Feb 27, 2026, 08:21 PM UTC

This incident has been resolved. All platform and API operations are working normally.

Read the full incident report →

Minor February 27, 2026

Capacity issues in iad and dfw

Detected by Pingoru: Feb 27, 2026, 03:34 PM UTC
Resolved: Feb 27, 2026, 05:54 PM UTC
Duration: 2h 20m

Affected: Deployments

Timeline · 3 updates

identified Feb 27, 2026, 03:34 PM UTC

These regions (Dallas, TX dfw and Ashburn, VA iad) are currently low on capacity. New machine creates in these regions might fail temporarily, and Depot builders may be unavailable, causing deploys to hang in "Waiting for Depot builder". If you are having issues with Depot builders, consider moving them to a different non-iad, non-dfw region in your fly.io dashboard's "Settings" page under "App builders", or try `--depot=false`.
monitoring Feb 27, 2026, 05:31 PM UTC

We have provisioned additional capacity in dfw and iad and are monitoring to ensure machine and builder starts are succeeding consistently.
resolved Feb 27, 2026, 05:54 PM UTC

This incident has been resolved.

Read the full incident report →

Notice February 26, 2026

Capacity isssues in iad and dfw

Detected by Pingoru: Feb 26, 2026, 05:00 PM UTC
Resolved: Feb 26, 2026, 10:28 PM UTC
Duration: 5h 27m

Affected: Management Plane - IADRemote BuildsDFW - Dallas, Texas (US)IAD - Ashburn, Virginia (US)

Timeline · 6 updates

identified Feb 26, 2026, 05:00 PM UTC

We have identified the problem and are working on a fix.
identified Feb 26, 2026, 05:05 PM UTC

New machine creates in these regions might fail temporarily, and Depot builders may be unavailable. If you are having issues with Depot builders, consider moving them to a different region, or try `--depot=false`.
identified Feb 26, 2026, 05:18 PM UTC

We've identified some newly created Managed Postgres clusters are failing to come up healthy in these regions.
identified Feb 26, 2026, 06:57 PM UTC

We have added additional capacity in DFW and IAD regions and are monitoring the impact. New machine creates and deploys without volumes are seeing improved success rates. Deploys using depot builders in those regions are also improving, with much quicker builder start times. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.
monitoring Feb 26, 2026, 08:19 PM UTC

We're continuing to monitor after having added more capacity to our DFW and IAD regions. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.
resolved Feb 26, 2026, 10:28 PM UTC

This incident has been resolved.

Read the full incident report →

Notice February 24, 2026

Degraded Managed Postgres Control Plane

Detected by Pingoru: Feb 24, 2026, 08:30 PM UTC
Resolved: Feb 23, 2026, 08:30 PM UTC
Duration: —

Affected: Management Plane - IADManagement Plane - LHRManagement Plane - SIN

Timeline · 2 updates

investigating Feb 23, 2026, 03:00 PM UTC

We are currently investigating issues with the MPG control plane. Users may experience delays or hanging when creating or deleting databases via the dashboard or CLI.
resolved Feb 24, 2026, 12:31 AM UTC

This incident has been resolved as of 20:30 UTC.

Read the full incident report →

Notice February 24, 2026

Sprites API degradation

Detected by Pingoru: Feb 24, 2026, 05:23 PM UTC
Resolved: Feb 24, 2026, 05:51 PM UTC
Duration: 28m

Affected: Sprites

Timeline · 3 updates

identified Feb 24, 2026, 05:23 PM UTC

A slow deploy is causing Sprites API degradation. We are implementing a fix.
identified Feb 24, 2026, 05:24 PM UTC

A slow deploy is causing Sprites API degradation. We are implementing a fix.
resolved Feb 24, 2026, 05:51 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 24, 2026

Sprite creations failing

Detected by Pingoru: Feb 24, 2026, 09:39 AM UTC
Resolved: Feb 24, 2026, 10:44 AM UTC
Duration: 1h 4m

Affected: Sprites

Timeline · 3 updates

investigating Feb 24, 2026, 09:39 AM UTC

We are currently investigating issues creating new Sprites.
monitoring Feb 24, 2026, 10:25 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 24, 2026, 10:44 AM UTC

This incident has been resolved.

Read the full incident report →

Minor February 24, 2026

Metrics are degraded

Detected by Pingoru: Feb 24, 2026, 04:33 AM UTC
Resolved: Feb 24, 2026, 11:06 AM UTC
Duration: 6h 32m

Affected: Metrics

Timeline · 5 updates

identified Feb 24, 2026, 04:33 AM UTC

In some cases data is missing or lagging. We've identified the problem and are working on a fix.
identified Feb 24, 2026, 05:49 AM UTC

We're continuing to work with VictoriaMetrics support on a fix for this issue.
monitoring Feb 24, 2026, 06:46 AM UTC

Metrics are coming back online, but it will take a little time to process what's backed up in the queues.
monitoring Feb 24, 2026, 09:35 AM UTC

Delayed metrics are still being processed.
resolved Feb 24, 2026, 11:06 AM UTC

Metrics processing has caught up, and we don't see any data loss.

Read the full incident report →

Minor February 20, 2026

Deploys hanging at waiting for Depot Builder

Detected by Pingoru: Feb 20, 2026, 04:14 PM UTC
Resolved: Feb 20, 2026, 08:49 PM UTC
Duration: 4h 35m

Affected: Deployments

Timeline · 5 updates

monitoring Feb 20, 2026, 04:14 PM UTC

We have seen elevated latency provisioning Depot builders during deployments over the past hour. This caused some deploys to hang or timeout at the "Waiting for Depot Builder" step in this period. Latency has improved and builder provision times are back to normal. We're continuing to monitor to ensure latency remains normal.
identified Feb 20, 2026, 04:39 PM UTC

We are again seeing elevated latency provisioning depot builders on new deploys. Users may see deploys using Depot builders hang or timeout at the "Waiting for Depot Builder" step. We are working on a fix. We are switching all deploys to use the default Fly builders in the meantime. If desired users can manually switch back to depot builders using `fly deploy --depot=true` but may continue to see latency issues at this time.
identified Feb 20, 2026, 05:59 PM UTC

A fix is being rolled out. Fly builders continue to be the default while this is deployed
monitoring Feb 20, 2026, 07:38 PM UTC

The fix has been rolled out and we are seeing deploys using depot builder succeeding normally. We continue to monitor to ensure full recovery. Depot builders have been reenabled as the default option for new deploys
resolved Feb 20, 2026, 08:49 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 20, 2026

Networking issues for users connecting through lhr

Detected by Pingoru: Feb 20, 2026, 10:52 AM UTC
Resolved: Feb 20, 2026, 11:57 AM UTC
Duration: 1h 4m

Affected: Customer ApplicationsDashboardLHR - London, United Kingdom

Timeline · 3 updates

investigating Feb 20, 2026, 10:52 AM UTC

We’re currently investigating this issue.
monitoring Feb 20, 2026, 11:21 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 20, 2026, 11:57 AM UTC

Network traffic in LHR has been stable for some time now, we are not seeing any further issues.

Read the full incident report →

Minor February 19, 2026

Investigating registry issues affecting deploys

Detected by Pingoru: Feb 19, 2026, 09:14 PM UTC
Resolved: Feb 20, 2026, 12:05 AM UTC
Duration: 2h 51m

Affected: Customer ApplicationsMachines APIDeployments

Timeline · 5 updates

investigating Feb 19, 2026, 09:14 PM UTC

We are currently investigating this issue.
identified Feb 19, 2026, 09:43 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Feb 19, 2026, 09:49 PM UTC

A fix has been implemented and we are monitoring the results.
identified Feb 19, 2026, 10:24 PM UTC

While we have seen some improvement from the previous fix, we are still seeing elevated rates of Registry connection issues. Users may continue to see slower machine creates and deploys due to slow image pulls. Deploys may succeed on a retry. We are continuing to work on restoring normal registry performance
resolved Feb 20, 2026, 12:05 AM UTC

This incident has been resolved.

Read the full incident report →

Major February 18, 2026

Control plane state delayed on some hosts possibly causing network or deployment disruption

Detected by Pingoru: Feb 18, 2026, 04:22 PM UTC
Resolved: Feb 18, 2026, 04:44 PM UTC
Duration: 22m

Affected: Customer ApplicationsMachines APIDeploymentsCorrosion

Timeline · 4 updates

identified Feb 18, 2026, 04:22 PM UTC

The issue has been identified and a fix is being implemented.
identified Feb 18, 2026, 04:23 PM UTC

We are continuing to work on a fix for this issue.
monitoring Feb 18, 2026, 04:28 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 18, 2026, 04:44 PM UTC

This incident has been resolved.

Read the full incident report →

Major February 17, 2026

flyctl deploy timeouts

Detected by Pingoru: Feb 17, 2026, 01:06 PM UTC
Resolved: Feb 17, 2026, 02:24 PM UTC
Duration: 1h 18m

Affected: Deployments

Timeline · 3 updates

identified Feb 17, 2026, 01:06 PM UTC

We’re investigating elevated 429 errors from flaps causing deployment timeouts. Affected deploys are failing with: ✖ Failed: error waiting for release_command machine XX to finish running: timeout reached waiting for machine's state to change Your machine never reached the state "destroyed".
monitoring Feb 17, 2026, 01:42 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 17, 2026, 02:24 PM UTC

Earlier today, an issue caused elevated rate limiting and some deployment timeouts. A fix is in place and deployments are back to normal.

Read the full incident report →

Major February 14, 2026

Degraded Managed Postgres Control Plane in ORD

Detected by Pingoru: Feb 14, 2026, 11:33 AM UTC
Resolved: Feb 14, 2026, 02:27 PM UTC
Duration: 2h 54m

Affected: Management Plane - ORD

Timeline · 5 updates

investigating Feb 14, 2026, 11:33 AM UTC

We are currently investigating issues with the MPG control plane in ORD. A small number of clusters in the region may be seeing replication lag or PGBouncers connectivity issues at this time.
identified Feb 14, 2026, 11:47 AM UTC

The issue has been identified and we are working on a fix. The majority of MPG clusters in ORD continue to run normally, though some users may still see degraded replicas at this time. Some clusters in the region will have experienced a primary -> replica failover.
identified Feb 14, 2026, 01:47 PM UTC

We are continuing to work on a fix for this issue.
monitoring Feb 14, 2026, 02:07 PM UTC

A fix has been implemented and we are seeing full recovery of the control plane in ORD. With that recovery we are seeing impacted replicas catching up and clusters returning to normal health. We're continuing to monitor for full recovery.
resolved Feb 14, 2026, 02:27 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 11, 2026

Issues with deploying apps using Depot builders for new accounts

Detected by Pingoru: Feb 11, 2026, 08:44 PM UTC
Resolved: Feb 11, 2026, 09:30 PM UTC
Duration: 46m

Affected: Deployments

Timeline · 4 updates

investigating Feb 11, 2026, 08:44 PM UTC

Some new Fly.io users may encounter an "upgrade your organization" error message when attempting to deploy apps for the first time. We're currently working with Depot to figure out what's causing the issue. In the meantime, you should be able to work around the issue by using Fly builders with `fly deploy --depot=false`.
identified Feb 11, 2026, 08:57 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Feb 11, 2026, 09:24 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 11, 2026, 09:30 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 11, 2026

Creating new sprites is degraded

Detected by Pingoru: Feb 11, 2026, 06:07 AM UTC
Resolved: Feb 11, 2026, 07:22 AM UTC
Duration: 1h 15m

Affected: Sprites

Timeline · 6 updates

investigating Feb 11, 2026, 06:07 AM UTC

Sprite creation generates an error that the sprite "is not assigned to compute." Eventually the sprite transitions from an unknown state to warm, so there is a delay before the sprite is usable.
investigating Feb 11, 2026, 06:08 AM UTC

We are continuing to investigate this issue.
investigating Feb 11, 2026, 06:09 AM UTC

We are continuing to investigate this issue.
identified Feb 11, 2026, 06:52 AM UTC

We've identified the cause of the delay following creates and we're deploying a fix.
monitoring Feb 11, 2026, 06:57 AM UTC

Sprite creation appears to be back to normal operation now.
resolved Feb 11, 2026, 07:22 AM UTC

This incident has been resolved.

Read the full incident report →

Minor February 10, 2026

Degraded MPG clusters in IAD

Detected by Pingoru: Feb 10, 2026, 07:00 PM UTC
Resolved: Feb 10, 2026, 08:44 PM UTC
Duration: 1h 43m

Affected: Management Plane - IAD

Timeline · 5 updates

investigating Feb 10, 2026, 07:00 PM UTC

We're currently looking into an issue with MPG clusters in the IAD region.
identified Feb 10, 2026, 07:15 PM UTC

We've identified the issue - some MPG clusters in IAD should be seeing improvements, and we're working on rolling out a fix for the remaining impacted clusters.
identified Feb 10, 2026, 07:53 PM UTC

We've rolled out a fix for some additional impacted clusters, and we're continuing to work on the remaining clusters.
monitoring Feb 10, 2026, 08:00 PM UTC

We've rolled out a fix for the remaining impacted clusters, and we're now monitoring the results.
resolved Feb 10, 2026, 08:44 PM UTC

This incident has been resolved.

Read the full incident report →

Minor February 9, 2026

Issue creating new Sprites in IAD

Detected by Pingoru: Feb 09, 2026, 08:29 PM UTC
Resolved: Feb 09, 2026, 09:38 PM UTC
Duration: 1h 8m

Timeline · 4 updates

investigating Feb 09, 2026, 08:29 PM UTC

We're currently looking into an issue that's preventing new Sprites from being created in IAD. Sprite creation from other regions are unaffected.
identified Feb 09, 2026, 08:45 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Feb 09, 2026, 09:19 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 09, 2026, 09:38 PM UTC

This incident has been resolved.

Read the full incident report →

Major February 9, 2026

Degraded network in AMS

Detected by Pingoru: Feb 09, 2026, 07:17 AM UTC
Resolved: Feb 09, 2026, 10:55 AM UTC
Duration: 3h 38m

Affected: AMS - Amsterdam, NetherlandsManagement Plane - AMS

Timeline · 6 updates

investigating Feb 09, 2026, 07:17 AM UTC

One of our upstream providers is performing an emergency DC maintenance. You may see degraded connectivity on some of your apps in AMS. Most apps in AMS are not affected.
identified Feb 09, 2026, 07:34 AM UTC

One of our upstream providers is experiencing a major power issue in their AMS datacenter. Managed Postgres instances in AMS are experiencing an outage as our control plane for Managed Postgres is taken down by the incident.
identified Feb 09, 2026, 07:42 AM UTC

Affected hosts are starting to come back online. We are working on restoring affected MPG clusters.
identified Feb 09, 2026, 08:58 AM UTC

We are still working on restoring the MPG clusters. Most of them should be operational already.
monitoring Feb 09, 2026, 09:47 AM UTC

A fix has been implemented and we are monitoring the results.
resolved Feb 09, 2026, 10:55 AM UTC

This incident has been resolved.

Read the full incident report →

Major February 7, 2026

Machines API issues

Detected by Pingoru: Feb 07, 2026, 04:23 PM UTC
Resolved: Feb 07, 2026, 06:13 PM UTC
Duration: 1h 49m

Affected: Machines API

Timeline · 4 updates

investigating Feb 07, 2026, 04:23 PM UTC

We are investigating widespread Machines API issues since 16:00 UTC. You may experience 5xx errors or higher latency at this time.
identified Feb 07, 2026, 04:40 PM UTC

The issue has been identified and we are seeing Machines API performance improve in most regions since ~16:20 UTC. Machines API calls in the SYD, NRT, SIN region may continue to see 5xx errors or higher latency at this time. We are continuing to work on restoring full API performance in all regions
monitoring Feb 07, 2026, 05:17 PM UTC

A fix has been implemented and we are seeing Machines API connectivity improve in APAC regions. We continue monitoring for full recovery.
resolved Feb 07, 2026, 06:13 PM UTC

This incident has been resolved.

Read the full incident report →

Looking to track Fly.io downtime and outages?

Pingoru polls Fly.io's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when Fly.io reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track Fly.io alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring Fly.io for free

5 free monitors · No credit card required