Nango incident

Function executions are cur...

Nango experienced a minor incident on March 5, 2026 affecting Nango Cloud Health, lasting 1h 10m. The incident has been resolved; the full update timeline is below.

Started: Mar 05, 2026, 03:05 PM UTC
Resolved: Mar 05, 2026, 04:15 PM UTC
Duration: 1h 10m
Detected by Pingoru: Mar 05, 2026, 03:05 PM UTC

Affected components

Nango Cloud Health

Update timeline

investigating Mar 05, 2026, 03:05 PM UTC

We are experiencing an issue executing functions on Nango (actions, syncs, and webhooks). We are currently investigating and will provide updates here.
resolved Mar 05, 2026, 04:15 PM UTC

Functions recovered. Post mortem: Post-Incident Summary Date: 6 March 2026 Impact: Degraded function execution for actions and webhooks Status: Resolved Summary A bug in the task scheduling system disabled per-environment concurrency limits, allowing a single tenant to generate an unbounded burst of invocations. At the same time, the tenant’s functions were significantly longer running than typical workloads (120–150 seconds), which caused execution environments to remain occupied for extended periods. As the burst of tasks exceeded the rate at which execution environments became available, a backlog formed in the asynchronous invocation queue. This backlog increased the age of queued events and introduced elevated action latency. Eventually, queued tasks began exceeding the expiration limits enforced by the task system and expired before execution. Monitoring detected the issue through elevated latency metrics, after which the workload was identified and mitigated. Timeline (CET) Issue began: 16:00 Detected by monitoring / on-call paged: 16:20 Mitigated: 17:15 Root Cause A bug in the task system disabled per-environment concurrency limits, allowing a single tenant environment to generate an unbounded burst of invocations. The tenant’s functions were also significantly longer running than typical workloads (120–150 seconds), which meant execution environments remained occupied for extended periods and did not recycle quickly. Because provisioned concurrency was configured with a low maximum, most of the burst traffic was handled by on-demand capacity. While the function runtime continued scaling additional execution environments, the combination of burst traffic and long-running executions caused capacity to ramp more slowly than the incoming workload required. This created a backlog in the asynchronous invocation queue, which increased async event age and action latency. As the backlog grew, queued tasks eventually exceeded the expiration limits enforced by the task system and expired before they could be executed. Detection was delayed because alerting relies on latency averaged over 15-minute windows. Resolution The source tenant generating the burst workload was identified and incoming traffic was halted. Task system throttling logic was fixed to restore per-environment concurrency limits. After the workload was stopped and backlog drained, function processing returned to normal. All systems were fully operational by 17:15 CET. Follow-Up Actions System safeguards Fix the task system throttling bug to ensure per-environment concurrency limits are always enforced (completed). Capacity & scaling Increase provisioned concurrency autoscaling limits to better absorb bursts and reduce cold-start spillover. Monitoring & alerting Add alerts for AsyncEventAge to detect queue backlogs earlier. Add alerts for ProvisionedConcurrencySpilloverInvocations. Add anomaly detection on actions executed per minute / throughput drops.