- Detected by Pingoru
- Apr 28, 2026, 11:50 PM UTC
- Resolved
- Apr 29, 2026, 12:40 AM UTC
- Duration
- 49m
Affected: Deployments
Timeline · 4 updates
-
investigating Apr 28, 2026, 11:50 PM UTC
We're investigating an issue where fly deploy is creating new Fly machine instances rather than updating existing ones, leading to apps with a mixed state. We're currently investigating the issue. As a workaround, please try removing the "processes = [ "app" ]" line from your fly.toml configuration file and redeploying - this should resolve the issue in the meantime.
-
identified Apr 29, 2026, 12:07 AM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Apr 29, 2026, 12:31 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 29, 2026, 12:40 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 24, 2026, 10:45 PM UTC
- Resolved
- Apr 24, 2026, 11:31 PM UTC
- Duration
- 45m
Affected: IAD - Ashburn, Virginia (US)
Timeline · 5 updates
-
investigating Apr 24, 2026, 10:45 PM UTC
We are currently investigating the issue. Only a portion of machines within the region are impacted.
-
investigating Apr 24, 2026, 10:58 PM UTC
We are deploying a partial mitigation while we continue investigating.
-
investigating Apr 24, 2026, 11:18 PM UTC
We are continuing to investigate this issue.
-
monitoring Apr 24, 2026, 11:19 PM UTC
Network packet loss has returned to normal levels. We are monitoring the Machines API for stability.
-
resolved Apr 24, 2026, 11:31 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 23, 2026, 03:05 PM UTC
- Resolved
- Apr 23, 2026, 04:26 PM UTC
- Duration
- 1h 20m
Affected: Dashboard
Timeline · 5 updates
-
investigating Apr 23, 2026, 03:05 PM UTC
We're investigating reports of "500" errors when trying to add a new Github integration or edit an existing Github integration in Fly.io/dashboard. This only affects "Launch an app from Github" or trying to change settings for an app set up this way. Existing integrations continue to work normally. It does not affect deploys done with `flyctl` or existing, running apps.
-
identified Apr 23, 2026, 03:22 PM UTC
The issue has been identified and a fix is being implemented.
-
identified Apr 23, 2026, 03:22 PM UTC
We are continuing to work on a fix for this issue.
-
monitoring Apr 23, 2026, 03:39 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 23, 2026, 04:26 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 23, 2026, 11:17 AM UTC
- Resolved
- Apr 23, 2026, 11:50 AM UTC
- Duration
- 33m
Affected: Dashboard
Timeline · 4 updates
-
investigating Apr 23, 2026, 11:17 AM UTC
We are investigating issues with web dashboard.
-
identified Apr 23, 2026, 11:35 AM UTC
The issue has been identified and a fix is being implemented.
-
monitoring Apr 23, 2026, 11:45 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 23, 2026, 11:50 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 20, 2026, 02:29 PM UTC
- Resolved
- Apr 20, 2026, 05:38 PM UTC
- Duration
- 3h 9m
Affected: SIN - Singapore
Timeline · 2 updates
-
identified Apr 20, 2026, 03:29 PM UTC
We are currently working on resolving increased latencies in our Singapore region.
-
resolved Apr 20, 2026, 05:38 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 17, 2026, 01:06 PM UTC
- Resolved
- Apr 18, 2026, 08:42 PM UTC
- Duration
- 1d 7h
Affected: SSL/TLS Certificate Provisioning
Timeline · 3 updates
-
investigating Apr 17, 2026, 01:06 PM UTC
We are investigating an issue with the Vault server that stores TLS certificates. Provisioning new TLS certificates may fail, and connecting to domains whose existing certificate has not yet been cached may fail.
-
monitoring Apr 17, 2026, 03:34 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 18, 2026, 08:42 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 15, 2026, 11:08 AM UTC
- Resolved
- Apr 16, 2026, 10:59 AM UTC
- Duration
- 23h 51m
Timeline · 3 updates
-
investigating Apr 15, 2026, 11:08 AM UTC
We're currently investigating some networking issues in SYD. This is affecting a number of our central services.
-
monitoring Apr 15, 2026, 11:40 AM UTC
We've identified the issue and applied a fix. All services should be working as normal.
-
resolved Apr 16, 2026, 10:59 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 12, 2026, 06:50 PM UTC
- Resolved
- Apr 12, 2026, 11:03 PM UTC
- Duration
- 4h 12m
Timeline · 3 updates
-
investigating Apr 12, 2026, 06:50 PM UTC
We are currently investigating heightened network latency in ORD.
-
monitoring Apr 12, 2026, 07:26 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 12, 2026, 11:03 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 10, 2026, 06:42 PM UTC
- Resolved
- Apr 10, 2026, 09:48 PM UTC
- Duration
- 3h 5m
Affected: Management Plane - NRT
Timeline · 4 updates
-
investigating Apr 10, 2026, 06:42 PM UTC
We are investigating instability in the MPG control plane in the NRT (Toyko, Japan) region causing unexpected cluster failovers. Clusters return to health shortly after, but some users with clusters in NRT may see dropped connections or degraded performance at this time.
-
identified Apr 10, 2026, 08:13 PM UTC
The issue has been identified and a fix is being implemented. Users with clusters in NRT may continue to see instability at this time
-
monitoring Apr 10, 2026, 08:32 PM UTC
A fix has been implemented and we are seeing MPG performance in NRT normalize. We are continuing to monitor to ensure a stable recovery
-
resolved Apr 10, 2026, 09:48 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 09, 2026, 07:29 PM UTC
- Resolved
- Apr 09, 2026, 08:14 PM UTC
- Duration
- 45m
Affected: ORD - Chicago, Illinois (US)
Timeline · 2 updates
-
investigating Apr 09, 2026, 07:29 PM UTC
Some hosts in our Chicago (ORD) region are currently inaccessible. We are working with our provider to resolve this issue. To see if you are affected, please visit the personalized status page: https://fly.io/status A small amount of Managed Postgres clusters may also be inaccessible at this time.
-
resolved Apr 09, 2026, 08:14 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 09, 2026, 03:50 AM UTC
- Resolved
- Apr 09, 2026, 05:30 AM UTC
- Duration
- 1h 39m
Affected: Management Plane - SYD
Timeline · 4 updates
-
investigating Apr 09, 2026, 03:50 AM UTC
We are investigating elevated control plane issues for Managed Postgres clusters in SYD. The majority of clusters appear to be running fine, but new creates, backup restores, and upgrades may show errors or take longer than usual to complete. Some clusters will have seen a failover event from primary to standby.
-
identified Apr 09, 2026, 04:12 AM UTC
We are seeing an improvement in control plane performance in the SYD region. Some clusters in the region currently are showing degraded standby nodes and we are working to bring those back to full health.
-
monitoring Apr 09, 2026, 05:20 AM UTC
Control plane operations in SYD have returned to normal and all clusters are healthy at this time. We're continuing to monitor to ensure stable recovery.
-
resolved Apr 09, 2026, 05:30 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 08, 2026, 08:34 AM UTC
- Resolved
- Apr 08, 2026, 12:23 PM UTC
- Duration
- 3h 49m
Affected: Metrics
Timeline · 4 updates
-
investigating Apr 08, 2026, 08:34 AM UTC
We are currently investigating an issue with our metrics cluster.
-
monitoring Apr 08, 2026, 11:00 AM UTC
We have implemented a fix. We're monitoring the cluster for further issues.
-
monitoring Apr 08, 2026, 11:02 AM UTC
We are continuing to monitor for any further issues.
-
resolved Apr 08, 2026, 12:23 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 07, 2026, 03:08 PM UTC
- Resolved
- Apr 07, 2026, 06:17 PM UTC
- Duration
- 3h 8m
Affected: Dashboard
Timeline · 4 updates
-
investigating Apr 07, 2026, 03:08 PM UTC
We are investigating issues with our GraphQL API and web dashboard
-
identified Apr 07, 2026, 03:17 PM UTC
We have restored GraphQL and dashboard availability, but some actions (e.g. app state updates) may still be delayed.
-
monitoring Apr 07, 2026, 03:39 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Apr 07, 2026, 06:17 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 29, 2026, 03:00 PM UTC
- Resolved
- Mar 29, 2026, 04:01 PM UTC
- Duration
- 1h 1m
Affected: AMS - Amsterdam, NetherlandsSpritesSIN - Singapore
Timeline · 6 updates
-
identified Mar 29, 2026, 03:00 PM UTC
We are currently investigating elevated errors when creating and starting machines in the SIN and AMS regions. Choosing other regions to create or deploy may help in the meantime
-
identified Mar 29, 2026, 03:13 PM UTC
This may also affect: - Remote builders in AMS and SIN regions, which could currently be experiencing degraded performance or failures. - Sprites starting from a cold state, which may experience failures in starting
-
identified Mar 29, 2026, 03:19 PM UTC
We are currently investigating capacity issues in SIN and AMS regions that are affecting: - Machine Create and Start events - Deployments, due to affected, degraded Remote Builders - Sprite startup from cold state
-
monitoring Mar 29, 2026, 03:33 PM UTC
We've freed up additional room in the SIN and AMS regions and are monitoring capacity.
-
monitoring Mar 29, 2026, 03:35 PM UTC
We've freed up additional room in the SIN and AMS regions and are monitoring capacity.
-
resolved Mar 29, 2026, 04:01 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 27, 2026, 06:08 PM UTC
- Resolved
- Mar 27, 2026, 09:51 PM UTC
- Duration
- 3h 42m
Affected: DeploymentsIAD - Ashburn, Virginia (US)
Timeline · 5 updates
-
investigating Mar 27, 2026, 06:08 PM UTC
We're currently investigating capacity issues in IAD that is preventing machine starts (machine creates are currently unaffected). This may result in deploys failing to complete (even for apps outside of the IAD region). As a workaround, using legacy Fly builders explicitly located in another region (i.e., `FLY_REMOTE_BUILDER_REGION=lhr fly deploy --depot=false --recreate-builder`) may help in the meantime.
-
investigating Mar 27, 2026, 06:47 PM UTC
We're continuing to evaluate our options for increasing short-term capacity in the IAD region.
-
identified Mar 27, 2026, 07:21 PM UTC
We've brought some additional capacity online in IAD and are seeing improvements, and we're continuing to work on adding more and freeing up additional room.
-
monitoring Mar 27, 2026, 09:09 PM UTC
With the additional capacity we've brought online, machine start failure rates in IAD have now recovered. We'll continue to monitor IAD capacity.
-
resolved Mar 27, 2026, 09:51 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 26, 2026, 03:21 PM UTC
- Resolved
- Mar 26, 2026, 05:54 PM UTC
- Duration
- 2h 33m
Affected: ORD - Chicago, Illinois (US)
Timeline · 5 updates
-
investigating Mar 26, 2026, 03:21 PM UTC
We are currently investigating elevated errors creating machines in the ORD (Chicago, Illinois) region. Users may see `failed to launch VM: request returned non-2xx status: 408` errors when creating, updating, or scaling machines in ORD. Existing, already running machines in the ORD region continue to run as normal.
-
investigating Mar 26, 2026, 04:08 PM UTC
We are continuing to investigate this issue. We are seeing 408 errors decreasing in ORD, though still above baseline.
-
identified Mar 26, 2026, 04:50 PM UTC
We've identified the cause of this increased failure rate and a fix is in progress. We are seeing most creates in ORD succeed at this time, though failure rate is still above baseline.
-
monitoring Mar 26, 2026, 05:28 PM UTC
We've implemented a fix and have seen error rates for machine creates in ORD drop off. We're continuing to monitor the results.
-
resolved Mar 26, 2026, 05:54 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 26, 2026, 12:37 PM UTC
- Resolved
- Mar 26, 2026, 02:19 PM UTC
- Duration
- 1h 42m
Affected: Management Plane - FRAFRA - Frankfurt, Germany
Timeline · 4 updates
-
investigating Mar 26, 2026, 12:37 PM UTC
We are investigating network issues in FRA region. Apps and/or Managed Postgres clusters in the region may be inaccessible at this time.
-
monitoring Mar 26, 2026, 01:14 PM UTC
Apps and Managed Postgres clusters in FRA region should be back online at this time. We are monitoring for any further issues.
-
identified Mar 26, 2026, 01:16 PM UTC
Some Managed Postgres clusters in FRA region are still unreachable, we are investigating this issue.
-
resolved Mar 26, 2026, 02:19 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 23, 2026, 03:18 PM UTC
- Resolved
- Mar 23, 2026, 04:27 PM UTC
- Duration
- 1h 9m
Affected: Logs
Timeline · 4 updates
-
investigating Mar 23, 2026, 03:18 PM UTC
Using the Logs panel in Grafana at https://fly-metrics.net/ will show a 502 error from the backend and won't show any logs. You can use `fly logs` or the live log viewer directly on https://fly.io/dashboard to view streaming logs for the time being.
-
identified Mar 23, 2026, 03:41 PM UTC
Using the Logs panel in Grafana at https://fly-metrics.net/ will show a 502 error from the backend and won't show any logs. You can use `fly logs` or the live log viewer directly on https://fly.io/dashboard to view streaming logs for the time being.
-
monitoring Mar 23, 2026, 03:55 PM UTC
We've deployed a fix and are monitoring the results. Logs are now be visible on Grafana.
-
resolved Mar 23, 2026, 04:27 PM UTC
This incident is resolved, Grafana logs are now working properly.
Read the full incident report →
- Detected by Pingoru
- Mar 20, 2026, 07:26 AM UTC
- Resolved
- Mar 23, 2026, 01:19 PM UTC
- Duration
- 3d 5h
Affected: DFW - Dallas, Texas (US)
Timeline · 5 updates
-
investigating Mar 20, 2026, 07:26 AM UTC
The Machines start failure rate is elevated in DFW.
-
monitoring Mar 20, 2026, 08:08 AM UTC
We freed up some capacity on our workers to allow for successful Machine starts.
-
monitoring Mar 20, 2026, 12:45 PM UTC
In addition to freeing up existing capacity, the team has provisioned new capacity in DFW and we are monitoring the results.
-
monitoring Mar 21, 2026, 08:26 AM UTC
Machine start success rates in DFW have improved but we are continuing to monitor and make further adjustments. We will provide updates as the situation progresses.
-
resolved Mar 23, 2026, 01:19 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 18, 2026, 04:12 PM UTC
- Resolved
- Mar 18, 2026, 05:02 PM UTC
- Duration
- 49m
Affected: SJC - San Jose, California (US)
Timeline · 3 updates
-
investigating Mar 18, 2026, 04:12 PM UTC
We are investigating intermittent network issues in SJC region impacting outbound public IPv6 access from Machines. Connecting to IPv6 internet resources from apps hosted in SJC region may be slow or fail at this time. IPv4 access, as well as 6PN private networking, are unaffected.
-
monitoring Mar 18, 2026, 04:31 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 18, 2026, 05:02 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 18, 2026, 02:12 PM UTC
- Resolved
- Mar 18, 2026, 02:18 PM UTC
- Duration
- 5m
Affected: Machines API
Timeline · 3 updates
-
identified Mar 18, 2026, 02:12 PM UTC
We have identified an issue causing new `fly ssh console` connections to fail with 500 errors. A fix is in progress.
-
monitoring Mar 18, 2026, 02:17 PM UTC
A fix has been implemented and we are seeing `ssh console` commands succeed as normal.
-
resolved Mar 18, 2026, 02:18 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 18, 2026, 02:07 PM UTC
- Resolved
- Mar 18, 2026, 02:18 PM UTC
- Duration
- 11m
Affected: Management Plane - SJCSJC - San Jose, California (US)
Timeline · 2 updates
-
monitoring Mar 18, 2026, 02:07 PM UTC
Between 13:55 and 14:03 UTC machines and MPG clusters hosted in the SJC region saw elevated connection errors. Users may have seen errors connecting to or from most machines in the region, as well as with deployments or updates to machines in the region. Networking has returned to normal in the region, and we are continuing to monitor closely to ensure stable recovery.
-
resolved Mar 18, 2026, 02:18 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 14, 2026, 04:20 AM UTC
- Resolved
- Mar 14, 2026, 02:05 PM UTC
- Duration
- 9h 44m
Affected: Sprites
Timeline · 2 updates
-
monitoring Mar 14, 2026, 01:55 PM UTC
Organizations with names prefixed with numerical digits may experience 401 errors. Affected operations include actions such as Sprite creation, listing, etc... A fix has been implemented since 2026-03-14 12:30 UTC and we are monitoring the results!
-
resolved Mar 14, 2026, 02:05 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 11, 2026, 11:03 AM UTC
- Resolved
- Mar 11, 2026, 11:37 AM UTC
- Duration
- 33m
Affected: Machines APIDeployments
Timeline · 4 updates
-
identified Mar 11, 2026, 09:19 AM UTC
An ongoing data migration in our secret storage service is causing degraded Machines API functionality.
-
monitoring Mar 11, 2026, 10:14 AM UTC
A fix has been implemented and we are monitoring the results.
-
monitoring Mar 11, 2026, 11:03 AM UTC
While the secret storage service was in a read-only state, app creation requests queued up, due to the retry logic and insufficient request concurrency limits in our GraphQL API. This prevented our GraphQL API from serving any other requests. We have scaled up the GraphQL API and are continuing to monitor the situation.
-
resolved Mar 11, 2026, 11:37 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 07, 2026, 02:42 PM UTC
- Resolved
- Mar 07, 2026, 03:56 PM UTC
- Duration
- 1h 14m
Affected: SYD - Sydney, Australia
Timeline · 3 updates
-
investigating Mar 07, 2026, 02:42 PM UTC
We are investigating a private networking failure between SYD and other regions. Apps continue to run, and private networking within SYD is unaffected.
-
monitoring Mar 07, 2026, 03:10 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 07, 2026, 03:56 PM UTC
This incident has been resolved.
Read the full incident report →