Trigger.dev Outage History

Trigger.dev is up right now

There were 7 Trigger.dev outages since March 1, 2026 totaling 61h 15m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.trigger.dev

Minor April 21, 2026

DNS in us-east-1 is degraded

Detected by Pingoru
Apr 21, 2026, 03:40 PM UTC
Resolved
Apr 23, 2026, 05:15 PM UTC
Duration
2d 1h
Affected: Current status by service (Trigger.dev cloud)Current status by service (Trigger.dev OpenTelemetry)
Timeline · 2 updates
  1. investigating Apr 21, 2026, 03:40 PM UTC

    We are seeing some DNS resolution issues in us-east-1. OTel spans in the web dashboard are also affected. We are investigating the root cause. In the meantime, please consider switching to the eu-central-1 region while we investigate.

  2. resolved Apr 23, 2026, 05:15 PM UTC

    Postmortem: Intermittent DNS Resolution Failures (US region) Date: 2026-04-21 Duration: ~1 hour Impact: Intermittent DNS resolution failures for tasks running in our US region Summary Between approximately 15:50 and 16:50 UTC on 2026-04-21, some tasks running in our US region experienced intermittent DNS resolution failures when connecting to external services such as databases, object storage, and third-party APIs. The errors typically surfaced as EAI_AGAIN or transient name-resolution timeouts. Our EU region was unaffected. The platform itself remained available throughout, and the incident self-resolved before any manual intervention. What customers experienced Tasks in the US region that made outbound connections to external hostnames during the affected window could see intermittent resolution failures. Cluster-internal communication and task scheduling were not impacted. We did not observe degradation of the dashboard, API, or engine. Root cause Under elevated workload, an internal telemetry service hit its memory limit and restarted. When it came back up, a large number of running tasks simultaneously re-established their connections to it, triggering a burst of DNS lookups for a single internal hostname. A default DNS configuration on our pods amplified that burst by roughly 10x at the wire level, because each hostname resolution attempted several search-path variants, and each of those was duplicated across the IPv4 and IPv6 query families. The telemetry service then went into a short restart loop, so the reconnection surge repeated, keeping DNS resolution degraded across the window. Some external name-resolution attempts from customer task code ran long enough to hit their own timeouts and surfaced as EAI_AGAIN. During the incident, our cluster DNS resolver remained healthy by its own metrics. The failure was silent at the node-network layer and was only fully reconstructable from after-the-fact packet capture. This is the main reason we could not attribute the degradation in real time, and it is the primary gap we have closed in the days since. What we've done Further scaled the telemetry service horizontally, so there's more capacity headroom under workload spikes, and a simultaneous all-replicas restart is less likely. Deployed a per-node DNS cache across our worker fleet to absorb redundant lookup patterns at source, before they reach the cluster DNS resolver. Reduced DNS amplification at source on platform-managed services, so a single hostname lookup produces one wire query instead of ten. Extended our monitoring for network-level silent drops, per-upstream DNS resolver latency, and telemetry pipeline health. Shipped a suite of production alerts calibrated against this incident's actual signature. A future incident of this class should be detected in minutes rather than reconstructed after the fact. Ongoing A separate observation about how our cluster DNS resolver forwards queries under load is under continued investigation. It is not currently known to cause user-facing impact, but it is a gap in our understanding that we are closing. We are re-evaluating DNS defaults on customer-facing tasks to remove the amplification pattern entirely, rather than just absorbing it with caching. We take incidents of this class seriously even when they self-resolve. The multi-day investigation was a deliberate choice to reconstruct the event with enough fidelity to fix the underlying observability gap, not just the immediate symptoms. We apologise for the disruption.

Read the full incident report →

Minor April 1, 2026

Realtime is behind

Detected by Pingoru
Apr 01, 2026, 12:00 PM UTC
Resolved
Apr 01, 2026, 06:57 PM UTC
Duration
6h 57m
Affected: Current status by service (Realtime)
Timeline · 2 updates
  1. investigating Apr 01, 2026, 12:00 PM UTC

    Realtime metadata updates and streaming v1 are not live, they've fallen behind. We're trying to remediate this.

  2. resolved Apr 01, 2026, 06:57 PM UTC

    Realtime is back to live. We're really sorry for this extended period of large delays. The service couldn't keep up the number of runs being processed and was falling further behind. We have made some configuration changes and upgraded it so it can cope with a higher throughput of runs. If you were using our React hooks that just did streaming, they were unimpacted by this.

Read the full incident report →

Minor March 16, 2026

Intermittent DNS issues in ...

Detected by Pingoru
Mar 16, 2026, 09:20 PM UTC
Resolved
Mar 17, 2026, 12:07 AM UTC
Duration
2h 47m
Affected: Current status by service (Trigger.dev cloud)
Timeline · 2 updates
  1. investigating Mar 16, 2026, 09:20 PM UTC

    From user runs we're seeing an increase in DNS related issues like: Error: getaddrinfo ENOTFOUND Error: getaddrinfo EAI_AGAIN We're investigating why this is happening.

  2. resolved Mar 17, 2026, 12:07 AM UTC

    DNS service is now back to fully operational. Increased traffic combined with a routine infrastructure rollout caused intermittent DNS resolution failures. We've tuned our DNS configuration to resolve the issue and are working on longer-term improvements to prevent recurrence.

Read the full incident report →

Minor March 6, 2026

Dashboard and telemetry deg...

Detected by Pingoru
Mar 06, 2026, 02:16 PM UTC
Resolved
Mar 06, 2026, 03:15 PM UTC
Duration
59m
Affected: Current status by service (Trigger.dev cloud)Current status by service (Trigger.dev OpenTelemetry)
Timeline · 2 updates
  1. investigating Mar 06, 2026, 02:16 PM UTC

    The runs list and detail pages in the dashboard are currently degraded due to an ongoing issue with our ClickHouse DB. We're also observing some logs and span ingestion failures. We're currently investigating. Run executions are not impacted.

  2. resolved Mar 06, 2026, 03:15 PM UTC

    The issue has been resolved. Dashboard and telemetry are now fully operational.

Read the full incident report →

Minor March 1, 2026

Elevated dequeue times in u...

Detected by Pingoru
Mar 01, 2026, 01:39 AM UTC
Resolved
Mar 01, 2026, 02:36 AM UTC
Duration
57m
Affected: Current status by service (Trigger.dev cloud)
Timeline · 2 updates
  1. investigating Mar 01, 2026, 01:39 AM UTC

    Dequeues are slower than normal in us-east-1. Runs are still executing, but they are slower to start. We’re investigating the issue.

  2. resolved Mar 01, 2026, 02:36 AM UTC

    The issue is now resolved and dequeue times are back to normal. Mainly free-tier runs were affected. This was caused by a spike in the free-tier run volume.

Read the full incident report →

Looking to track Trigger.dev downtime and outages?

Pingoru polls Trigger.dev's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Trigger.dev reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Trigger.dev alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Trigger.dev for free

5 free monitors · No credit card required