- Detected by Pingoru
- Mar 05, 2026, 07:24 PM UTC
- Resolved
- Mar 05, 2026, 07:50 PM UTC
- Duration
- 26m
Timeline · 3 updates
-
investigating Mar 05, 2026, 07:24 PM UTC
We're aware of routing issues affecting some customers in North America regions, and we're actively investigating.
-
monitoring Mar 05, 2026, 07:38 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 05, 2026, 07:50 PM UTC
This incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during this period.
Read the full incident report →
- Detected by Pingoru
- Mar 03, 2026, 08:18 PM UTC
- Resolved
- Mar 03, 2026, 09:15 PM UTC
- Duration
- 57m
Affected: DashboardDeployments
Timeline · 3 updates
-
investigating Mar 03, 2026, 08:18 PM UTC
We're investigating elevated GraphQL errors that affect some API endpoints.
-
monitoring Mar 03, 2026, 08:36 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 03, 2026, 09:15 PM UTC
This incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service. We are still investigating the root cause of the failure. In the mean time, all API endpoints now appear to be stable and errors have dropped to baseline level.
Read the full incident report →
- Detected by Pingoru
- Mar 03, 2026, 10:50 AM UTC
- Resolved
- Mar 03, 2026, 12:10 PM UTC
- Duration
- 1h 20m
Affected: Dashboard
Timeline · 2 updates
-
investigating Mar 03, 2026, 10:50 AM UTC
We are currently investigating this issue. The page currently displays: "We’re having trouble loading the cost breakdown."
-
resolved Mar 03, 2026, 12:10 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 03, 2026, 02:05 AM UTC
- Resolved
- Mar 03, 2026, 12:54 AM UTC
- Duration
- —
Timeline · 1 update
-
resolved Mar 03, 2026, 02:05 AM UTC
Between 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignments. As the failure mode was Vault requests hanging rather than failing immediately, TLS requests through fly-proxy for domains where the certificate was not cached on the local node remained open for a long time while proxy attempted to fetch the certificate; this caused some connections to fail as too many connection slots were taken up by requests waiting on Vault. The root cause of this incident was a partially completed update to the Vault cluster. We will be implementing safeguards in the proxy for this failure mode, as well as improving certificate storage longer-term.
Read the full incident report →
- Detected by Pingoru
- Mar 02, 2026, 09:19 PM UTC
- Resolved
- Mar 02, 2026, 09:50 PM UTC
- Duration
- 30m
Affected: DashboardMachines APIDeployments
Timeline · 4 updates
-
investigating Mar 02, 2026, 09:19 PM UTC
We're currently investigating issues with the Machines API. Customer deployments and the Fly dashboard may be affected.
-
identified Mar 02, 2026, 09:39 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Mar 02, 2026, 09:47 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 02, 2026, 09:50 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 02, 2026, 05:42 PM UTC
- Resolved
- Mar 02, 2026, 10:49 PM UTC
- Duration
- 5h 7m
Affected: EWR - Secaucus, NJ (US)
Timeline · 4 updates
-
investigating Mar 02, 2026, 05:42 PM UTC
We are currently investigating this issue.
-
identified Mar 02, 2026, 06:21 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Mar 02, 2026, 08:35 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 02, 2026, 10:49 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 27, 2026, 06:50 PM UTC
- Resolved
- Feb 27, 2026, 08:21 PM UTC
- Duration
- 1h 31m
Affected: DashboardMachines APIDeploymentsRemote BuildsSprites
Timeline · 9 updates
-
investigating Feb 27, 2026, 06:50 PM UTC
We are investigating increased in API request latency and timeouts with the main platform API. This is impacting multiple operations, including creating, querying or performing actions against machines, as well as platform level operations like adding payment methods.
-
identified Feb 27, 2026, 06:52 PM UTC
We have identified the cause of the increased latency and are working on a fix. The most common errors we are seeing is timeouts when users attempt to perform an action against a newly created app / machine resource. Those may timeout or fail with an `app|machine not found` error
-
identified Feb 27, 2026, 06:53 PM UTC
We are continuing to work on a fix for this issue.
-
identified Feb 27, 2026, 06:59 PM UTC
New Sprite creations are also timing out or failing at this time. We are continuing to work on a fix for this issue.
-
identified Feb 27, 2026, 07:05 PM UTC
We are currently seeing full API failures for requests to our Graphql API and elevated failures for the machines API. Direct calls to these apis may fail, along with many flyctl commands. We have identified the cause of the issue and are continuing to work on a fix. Existing running machines and apps should continue to be reachable, but creates, deploys, or other features relying on platform API calls will fail at this time.
-
identified Feb 27, 2026, 07:23 PM UTC
An initial fix has been deployed and we are seeing improvements in load and API performance. Some operations that rely on the Graphql API, such as new app creations and some deployments, will continue to fail at this time. We are continuing to work on restoring full availability.
-
identified Feb 27, 2026, 07:41 PM UTC
A second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to existing apps should also work. We are working through a backlog of background jobs. New app / organization creations and other other operations that use these will continue to see increased latency or failures while we work thorough these. New MPG cluster and new Sprite creation continues to be impacted.
-
monitoring Feb 27, 2026, 08:05 PM UTC
API and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery. Background jobs are almost fully caught up. Users may still see slightly slower requests creating new apps / orgs, but they should complete successfully. Sprite and MPG cluster creations are processing as normal.
-
resolved Feb 27, 2026, 08:21 PM UTC
This incident has been resolved. All platform and API operations are working normally.
Read the full incident report →
- Detected by Pingoru
- Feb 27, 2026, 03:34 PM UTC
- Resolved
- Feb 27, 2026, 05:54 PM UTC
- Duration
- 2h 20m
Affected: Deployments
Timeline · 3 updates
-
identified Feb 27, 2026, 03:34 PM UTC
These regions (Dallas, TX dfw and Ashburn, VA iad) are currently low on capacity. New machine creates in these regions might fail temporarily, and Depot builders may be unavailable, causing deploys to hang in "Waiting for Depot builder". If you are having issues with Depot builders, consider moving them to a different non-iad, non-dfw region in your fly.io dashboard's "Settings" page under "App builders", or try `--depot=false`.
-
monitoring Feb 27, 2026, 05:31 PM UTC
We have provisioned additional capacity in dfw and iad and are monitoring to ensure machine and builder starts are succeeding consistently.
-
resolved Feb 27, 2026, 05:54 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 26, 2026, 05:00 PM UTC
- Resolved
- Feb 26, 2026, 10:28 PM UTC
- Duration
- 5h 27m
Affected: Management Plane - IADRemote BuildsDFW - Dallas, Texas (US)IAD - Ashburn, Virginia (US)
Timeline · 6 updates
-
identified Feb 26, 2026, 05:00 PM UTC
We have identified the problem and are working on a fix.
-
identified Feb 26, 2026, 05:05 PM UTC
New machine creates in these regions might fail temporarily, and Depot builders may be unavailable. If you are having issues with Depot builders, consider moving them to a different region, or try `--depot=false`.
-
identified Feb 26, 2026, 05:18 PM UTC
We've identified some newly created Managed Postgres clusters are failing to come up healthy in these regions.
-
identified Feb 26, 2026, 06:57 PM UTC
We have added additional capacity in DFW and IAD regions and are monitoring the impact. New machine creates and deploys without volumes are seeing improved success rates. Deploys using depot builders in those regions are also improving, with much quicker builder start times. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.
-
monitoring Feb 26, 2026, 08:19 PM UTC
We're continuing to monitor after having added more capacity to our DFW and IAD regions. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.
-
resolved Feb 26, 2026, 10:28 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 24, 2026, 08:30 PM UTC
- Resolved
- Feb 23, 2026, 08:30 PM UTC
- Duration
- —
Affected: Management Plane - IADManagement Plane - LHRManagement Plane - SIN
Timeline · 2 updates
-
investigating Feb 23, 2026, 03:00 PM UTC
We are currently investigating issues with the MPG control plane. Users may experience delays or hanging when creating or deleting databases via the dashboard or CLI.
-
resolved Feb 24, 2026, 12:31 AM UTC
This incident has been resolved as of 20:30 UTC.
Read the full incident report →
- Detected by Pingoru
- Feb 24, 2026, 05:23 PM UTC
- Resolved
- Feb 24, 2026, 05:51 PM UTC
- Duration
- 28m
Affected: Sprites
Timeline · 3 updates
-
identified Feb 24, 2026, 05:23 PM UTC
A slow deploy is causing Sprites API degradation. We are implementing a fix.
-
identified Feb 24, 2026, 05:24 PM UTC
A slow deploy is causing Sprites API degradation. We are implementing a fix.
-
resolved Feb 24, 2026, 05:51 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 24, 2026, 09:39 AM UTC
- Resolved
- Feb 24, 2026, 10:44 AM UTC
- Duration
- 1h 4m
Affected: Sprites
Timeline · 3 updates
-
investigating Feb 24, 2026, 09:39 AM UTC
We are currently investigating issues creating new Sprites.
-
monitoring Feb 24, 2026, 10:25 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 24, 2026, 10:44 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 24, 2026, 04:33 AM UTC
- Resolved
- Feb 24, 2026, 11:06 AM UTC
- Duration
- 6h 32m
Affected: Metrics
Timeline · 5 updates
-
identified Feb 24, 2026, 04:33 AM UTC
In some cases data is missing or lagging. We've identified the problem and are working on a fix.
-
identified Feb 24, 2026, 05:49 AM UTC
We're continuing to work with VictoriaMetrics support on a fix for this issue.
-
monitoring Feb 24, 2026, 06:46 AM UTC
Metrics are coming back online, but it will take a little time to process what's backed up in the queues.
-
monitoring Feb 24, 2026, 09:35 AM UTC
Delayed metrics are still being processed.
-
resolved Feb 24, 2026, 11:06 AM UTC
Metrics processing has caught up, and we don't see any data loss.
Read the full incident report →
- Detected by Pingoru
- Feb 20, 2026, 04:14 PM UTC
- Resolved
- Feb 20, 2026, 08:49 PM UTC
- Duration
- 4h 35m
Affected: Deployments
Timeline · 5 updates
-
monitoring Feb 20, 2026, 04:14 PM UTC
We have seen elevated latency provisioning Depot builders during deployments over the past hour. This caused some deploys to hang or timeout at the "Waiting for Depot Builder" step in this period. Latency has improved and builder provision times are back to normal. We're continuing to monitor to ensure latency remains normal.
-
identified Feb 20, 2026, 04:39 PM UTC
We are again seeing elevated latency provisioning depot builders on new deploys. Users may see deploys using Depot builders hang or timeout at the "Waiting for Depot Builder" step. We are working on a fix. We are switching all deploys to use the default Fly builders in the meantime. If desired users can manually switch back to depot builders using `fly deploy --depot=true` but may continue to see latency issues at this time.
-
identified Feb 20, 2026, 05:59 PM UTC
A fix is being rolled out. Fly builders continue to be the default while this is deployed
-
monitoring Feb 20, 2026, 07:38 PM UTC
The fix has been rolled out and we are seeing deploys using depot builder succeeding normally. We continue to monitor to ensure full recovery. Depot builders have been reenabled as the default option for new deploys
-
resolved Feb 20, 2026, 08:49 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 20, 2026, 10:52 AM UTC
- Resolved
- Feb 20, 2026, 11:57 AM UTC
- Duration
- 1h 4m
Affected: Customer ApplicationsDashboardLHR - London, United Kingdom
Timeline · 3 updates
-
investigating Feb 20, 2026, 10:52 AM UTC
We’re currently investigating this issue.
-
monitoring Feb 20, 2026, 11:21 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 20, 2026, 11:57 AM UTC
Network traffic in LHR has been stable for some time now, we are not seeing any further issues.
Read the full incident report →
- Detected by Pingoru
- Feb 19, 2026, 09:14 PM UTC
- Resolved
- Feb 20, 2026, 12:05 AM UTC
- Duration
- 2h 51m
Affected: Customer ApplicationsMachines APIDeployments
Timeline · 5 updates
-
investigating Feb 19, 2026, 09:14 PM UTC
We are currently investigating this issue.
-
identified Feb 19, 2026, 09:43 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Feb 19, 2026, 09:49 PM UTC
A fix has been implemented and we are monitoring the results.
-
identified Feb 19, 2026, 10:24 PM UTC
While we have seen some improvement from the previous fix, we are still seeing elevated rates of Registry connection issues. Users may continue to see slower machine creates and deploys due to slow image pulls. Deploys may succeed on a retry. We are continuing to work on restoring normal registry performance
-
resolved Feb 20, 2026, 12:05 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 18, 2026, 04:22 PM UTC
- Resolved
- Feb 18, 2026, 04:44 PM UTC
- Duration
- 22m
Affected: Customer ApplicationsMachines APIDeploymentsCorrosion
Timeline · 4 updates
-
identified Feb 18, 2026, 04:22 PM UTC
The issue has been identified and a fix is being implemented.
-
identified Feb 18, 2026, 04:23 PM UTC
We are continuing to work on a fix for this issue.
-
monitoring Feb 18, 2026, 04:28 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 18, 2026, 04:44 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 17, 2026, 01:06 PM UTC
- Resolved
- Feb 17, 2026, 02:24 PM UTC
- Duration
- 1h 18m
Affected: Deployments
Timeline · 3 updates
-
identified Feb 17, 2026, 01:06 PM UTC
We’re investigating elevated 429 errors from flaps causing deployment timeouts. Affected deploys are failing with: ✖ Failed: error waiting for release_command machine XX to finish running: timeout reached waiting for machine's state to change Your machine never reached the state "destroyed".
-
monitoring Feb 17, 2026, 01:42 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 17, 2026, 02:24 PM UTC
Earlier today, an issue caused elevated rate limiting and some deployment timeouts. A fix is in place and deployments are back to normal.
Read the full incident report →
- Detected by Pingoru
- Feb 14, 2026, 11:33 AM UTC
- Resolved
- Feb 14, 2026, 02:27 PM UTC
- Duration
- 2h 54m
Affected: Management Plane - ORD
Timeline · 5 updates
-
investigating Feb 14, 2026, 11:33 AM UTC
We are currently investigating issues with the MPG control plane in ORD. A small number of clusters in the region may be seeing replication lag or PGBouncers connectivity issues at this time.
-
identified Feb 14, 2026, 11:47 AM UTC
The issue has been identified and we are working on a fix. The majority of MPG clusters in ORD continue to run normally, though some users may still see degraded replicas at this time. Some clusters in the region will have experienced a primary -> replica failover.
-
identified Feb 14, 2026, 01:47 PM UTC
We are continuing to work on a fix for this issue.
-
monitoring Feb 14, 2026, 02:07 PM UTC
A fix has been implemented and we are seeing full recovery of the control plane in ORD. With that recovery we are seeing impacted replicas catching up and clusters returning to normal health. We're continuing to monitor for full recovery.
-
resolved Feb 14, 2026, 02:27 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 11, 2026, 08:44 PM UTC
- Resolved
- Feb 11, 2026, 09:30 PM UTC
- Duration
- 46m
Affected: Deployments
Timeline · 4 updates
-
investigating Feb 11, 2026, 08:44 PM UTC
Some new Fly.io users may encounter an "upgrade your organization" error message when attempting to deploy apps for the first time. We're currently working with Depot to figure out what's causing the issue. In the meantime, you should be able to work around the issue by using Fly builders with `fly deploy --depot=false`.
-
identified Feb 11, 2026, 08:57 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Feb 11, 2026, 09:24 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 11, 2026, 09:30 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 11, 2026, 06:07 AM UTC
- Resolved
- Feb 11, 2026, 07:22 AM UTC
- Duration
- 1h 15m
Affected: Sprites
Timeline · 6 updates
-
investigating Feb 11, 2026, 06:07 AM UTC
Sprite creation generates an error that the sprite "is not assigned to compute." Eventually the sprite transitions from an unknown state to warm, so there is a delay before the sprite is usable.
-
investigating Feb 11, 2026, 06:08 AM UTC
We are continuing to investigate this issue.
-
investigating Feb 11, 2026, 06:09 AM UTC
We are continuing to investigate this issue.
-
identified Feb 11, 2026, 06:52 AM UTC
We've identified the cause of the delay following creates and we're deploying a fix.
-
monitoring Feb 11, 2026, 06:57 AM UTC
Sprite creation appears to be back to normal operation now.
-
resolved Feb 11, 2026, 07:22 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 10, 2026, 07:00 PM UTC
- Resolved
- Feb 10, 2026, 08:44 PM UTC
- Duration
- 1h 43m
Affected: Management Plane - IAD
Timeline · 5 updates
-
investigating Feb 10, 2026, 07:00 PM UTC
We're currently looking into an issue with MPG clusters in the IAD region.
-
identified Feb 10, 2026, 07:15 PM UTC
We've identified the issue - some MPG clusters in IAD should be seeing improvements, and we're working on rolling out a fix for the remaining impacted clusters.
-
identified Feb 10, 2026, 07:53 PM UTC
We've rolled out a fix for some additional impacted clusters, and we're continuing to work on the remaining clusters.
-
monitoring Feb 10, 2026, 08:00 PM UTC
We've rolled out a fix for the remaining impacted clusters, and we're now monitoring the results.
-
resolved Feb 10, 2026, 08:44 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 09, 2026, 08:29 PM UTC
- Resolved
- Feb 09, 2026, 09:38 PM UTC
- Duration
- 1h 8m
Timeline · 4 updates
-
investigating Feb 09, 2026, 08:29 PM UTC
We're currently looking into an issue that's preventing new Sprites from being created in IAD. Sprite creation from other regions are unaffected.
-
identified Feb 09, 2026, 08:45 PM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Feb 09, 2026, 09:19 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 09, 2026, 09:38 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 09, 2026, 07:17 AM UTC
- Resolved
- Feb 09, 2026, 10:55 AM UTC
- Duration
- 3h 38m
Affected: AMS - Amsterdam, NetherlandsManagement Plane - AMS
Timeline · 6 updates
-
investigating Feb 09, 2026, 07:17 AM UTC
One of our upstream providers is performing an emergency DC maintenance. You may see degraded connectivity on some of your apps in AMS. Most apps in AMS are not affected.
-
identified Feb 09, 2026, 07:34 AM UTC
One of our upstream providers is experiencing a major power issue in their AMS datacenter. Managed Postgres instances in AMS are experiencing an outage as our control plane for Managed Postgres is taken down by the incident.
-
identified Feb 09, 2026, 07:42 AM UTC
Affected hosts are starting to come back online. We are working on restoring affected MPG clusters.
-
identified Feb 09, 2026, 08:58 AM UTC
We are still working on restoring the MPG clusters. Most of them should be operational already.
-
monitoring Feb 09, 2026, 09:47 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 09, 2026, 10:55 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 07, 2026, 04:23 PM UTC
- Resolved
- Feb 07, 2026, 06:13 PM UTC
- Duration
- 1h 49m
Affected: Machines API
Timeline · 4 updates
-
investigating Feb 07, 2026, 04:23 PM UTC
We are investigating widespread Machines API issues since 16:00 UTC. You may experience 5xx errors or higher latency at this time.
-
identified Feb 07, 2026, 04:40 PM UTC
The issue has been identified and we are seeing Machines API performance improve in most regions since ~16:20 UTC. Machines API calls in the SYD, NRT, SIN region may continue to see 5xx errors or higher latency at this time. We are continuing to work on restoring full API performance in all regions
-
monitoring Feb 07, 2026, 05:17 PM UTC
A fix has been implemented and we are seeing Machines API connectivity improve in APAC regions. We continue monitoring for full recovery.
-
resolved Feb 07, 2026, 06:13 PM UTC
This incident has been resolved.
Read the full incident report →