Nango Outage History

Nango is up right now

There were 3 Nango outages since March 5, 2026 totaling 93h 19m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.nango.dev

Minor April 27, 2026

Increase in 502s

Detected by Pingoru
Apr 27, 2026, 12:16 PM UTC
Resolved
Apr 29, 2026, 08:25 AM UTC
Duration
1d 20h
Affected: Nango Cloud Health
Timeline · 4 updates
  1. investigating Apr 27, 2026, 12:16 PM UTC

    We are experiencing an increase in 502s on our public API. We believe we have found the cause and are preparing a release to resolve it.

  2. investigating Apr 27, 2026, 12:16 PM UTC

    We are experiencing an increase in 502s on our public API. We believe we have found the cause and are preparing a release to resolve it.

  3. resolved Apr 27, 2026, 02:59 PM UTC

    This has now been resolved.

  4. resolved Apr 29, 2026, 08:25 AM UTC

    Post-Incident Summary Date: 29 April 2026 Summary A gradual increase in Records API response payload size eventually exceeded the memory available to the pods serving those requests, causing them to be terminated. As pods were terminated and replaced, the load balancer returned 502 errors for requests routed to them. The growth had been building over several days and was not caught by internal monitoring before a customer reported the issue. Timeline (UTC) Issue began: 24 April, ~12:00 Reported by customer: 25 April, ~00:00 Investigation started: 27 April, ~07:00 Mitigated: 27 April, 15:00 Resolved: 27 April, 15:00 Root Cause The Records API response payload size had been growing gradually for several days. Once the payloads crossed the available pod memory headroom, the pods serving those requests began being terminated for exceeding memory limits. As terminated pods were replaced, the load balancer returned 502 errors for requests routed to pods that were shutting down or starting up. The growth was gradual rather than sudden, which meant our internal monitoring did not flag it before user-visible errors occurred. Diagnosis was also slowed because the default per-pod memory view smoothed out the underlying spikes, masking the memory-pressure pattern until the smoothing was removed and pod termination state was inspected directly. Resolution Per-pod memory limits on the affected service were increased, restoring headroom for the larger payloads and stopping the terminations. Once the new limits were in place, the load balancer stopped returning 502s and the service returned to normal. Follow-Up Actions Detection A monitor now pages the on-call team when load-balancer 5XX rates rise, so future user-impacting issues are caught immediately rather than relying on customer reports. System safeguards Bound peak memory per Records API request so response sizes cannot pressure pod memory. Improve telemetry around per-request response size so unusual growth surfaces in monitoring before it impacts users.

Read the full incident report →

Minor April 12, 2026

Degradation in sync executions

Detected by Pingoru
Apr 12, 2026, 08:00 AM UTC
Resolved
Apr 14, 2026, 08:00 AM UTC
Duration
2d
Affected: Nango Cloud Health
Timeline · 2 updates
  1. investigating Apr 12, 2026, 08:00 AM UTC

    Syncs are still delayed, while actions appear unaffected. The issue seems to be related to how the database is handling sync schedules. Once sync processing recovers, synced data will catch up automatically.

  2. resolved Apr 14, 2026, 08:00 AM UTC

    Post-Incident Summary Date: 12 April 2026 Impact: Degraded sync execution and delayed actions and webhook processing Status: Resolved Summary A webhook flood originating from a single customer environment caused one of our databases to saturate, resulting in broad degradation of asynchronous job processing. Sync execution dropped to near zero, and a large portion of actions and webhook-driven work were delayed or unable to run. A secondary bug in the scheduling system amplified the incident and blocked two consecutive recovery attempts before a fix was deployed. Timeline (UTC) Issue began: 07:00 Detected by monitoring: 07:00 Status page updated: 07:00 Mitigated: 15:30 Resolved: 15:55 Root Cause A single customer environment generated a sustained webhook flood, well above the typical baseline. Each incoming webhook triggered a database query to check the current queue depth for that customer's group before deciding whether to admit a new task. Under flood conditions, this query saturated one of our databases' CPU, preventing other work — including syncs and actions from all customers — from being scheduled or processed. Once the per-group queue cap was reached, new work could no longer be enqueued, and the system remained effectively stalled. Recovery was complicated by a separate bug in the recurring schedule path. When the scheduler encountered a group that had already hit the queue cap, an error in the code caused the exception to be swallowed silently. As a result, affected schedules were never marked as processed and were repeatedly retried on each scheduler tick, adding further load to an already saturated database. This caused two consecutive recovery attempts to fail. Resolution A fix was deployed to correct the scheduling bug, ensuring that capped groups are handled correctly and schedules are properly advanced after each pass. Task execution times were shifted forward in bulk to drain pressure from the database, then restored in batches. Once the backlog cleared, the system returned to a healthy state and full processing resumed by 15:55 UTC. Follow-Up Actions System safeguards Improve the current per-enqueue queue-depth admission-control mechanism to reduce database load under flood conditions. Define a rate limiting and load shedding strategy for webhook ingestion to protect the platform when a single customer generates sustained enqueue pressure. Fix the scheduling bug to correctly handle capped groups without silent failures (completed).

Read the full incident report →

Minor March 5, 2026

Function executions are cur...

Detected by Pingoru
Mar 05, 2026, 03:05 PM UTC
Resolved
Mar 05, 2026, 04:15 PM UTC
Duration
1h 10m
Affected: Nango Cloud Health
Timeline · 2 updates
  1. investigating Mar 05, 2026, 03:05 PM UTC

    We are experiencing an issue executing functions on Nango (actions, syncs, and webhooks). We are currently investigating and will provide updates here.

  2. resolved Mar 05, 2026, 04:15 PM UTC

    Functions recovered. Post mortem: Post-Incident Summary Date: 6 March 2026 Impact: Degraded function execution for actions and webhooks Status: Resolved Summary A bug in the task scheduling system disabled per-environment concurrency limits, allowing a single tenant to generate an unbounded burst of invocations. At the same time, the tenant’s functions were significantly longer running than typical workloads (120–150 seconds), which caused execution environments to remain occupied for extended periods. As the burst of tasks exceeded the rate at which execution environments became available, a backlog formed in the asynchronous invocation queue. This backlog increased the age of queued events and introduced elevated action latency. Eventually, queued tasks began exceeding the expiration limits enforced by the task system and expired before execution. Monitoring detected the issue through elevated latency metrics, after which the workload was identified and mitigated. Timeline (CET) Issue began: 16:00 Detected by monitoring / on-call paged: 16:20 Mitigated: 17:15 Root Cause A bug in the task system disabled per-environment concurrency limits, allowing a single tenant environment to generate an unbounded burst of invocations. The tenant’s functions were also significantly longer running than typical workloads (120–150 seconds), which meant execution environments remained occupied for extended periods and did not recycle quickly. Because provisioned concurrency was configured with a low maximum, most of the burst traffic was handled by on-demand capacity. While the function runtime continued scaling additional execution environments, the combination of burst traffic and long-running executions caused capacity to ramp more slowly than the incoming workload required. This created a backlog in the asynchronous invocation queue, which increased async event age and action latency. As the backlog grew, queued tasks eventually exceeded the expiration limits enforced by the task system and expired before they could be executed. Detection was delayed because alerting relies on latency averaged over 15-minute windows. Resolution The source tenant generating the burst workload was identified and incoming traffic was halted. Task system throttling logic was fixed to restore per-environment concurrency limits. After the workload was stopped and backlog drained, function processing returned to normal. All systems were fully operational by 17:15 CET. Follow-Up Actions System safeguards Fix the task system throttling bug to ensure per-environment concurrency limits are always enforced (completed). Capacity & scaling Increase provisioned concurrency autoscaling limits to better absorb bursts and reduce cold-start spillover. Monitoring & alerting Add alerts for AsyncEventAge to detect queue backlogs earlier. Add alerts for ProvisionedConcurrencySpilloverInvocations. Add anomaly detection on actions executed per minute / throughput drops.

Read the full incident report →

Looking to track Nango downtime and outages?

Pingoru polls Nango's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Nango reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Nango alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Nango for free

5 free monitors · No credit card required