Twingate incident

Relays443 Down

Twingate experienced a major incident on May 8, 2026 affecting Americas Relays and EMEA Relays and 1 more component, lasting 2h 10m. The incident has been resolved; the full update timeline is below.

Started: May 08, 2026, 08:44 AM UTC
Resolved: May 08, 2026, 10:55 AM UTC
Duration: 2h 10m
Detected by Pingoru: May 08, 2026, 08:44 AM UTC

Affected components

Americas RelaysEMEA RelaysAPAC Relays

Update timeline

investigating May 08, 2026, 08:44 AM UTC

We are seeing an issue with relay443. We are investigating. Note: These are special relays that offer connectivity over port 443 and are used by a subset of customers. So majority of customers shouldn't be seeing an issue,
identified May 08, 2026, 10:28 AM UTC

The issue has been identified and a fix is being implemented.
resolved May 08, 2026, 10:55 AM UTC

The fix has been rolled out and relays443 are now fully functional.
postmortem May 14, 2026, 09:19 AM UTC

**Components impacted** * Data Plane - Relay 443 flows only **Summary** On May 9, 2026, a subset of Twingate customers experienced an interruption to network access affecting Relay 443 connectivity. The disruption lasted approximately 2 hours and 10 minutes and was caused by a configuration error introduced during a routine software update. We have resolved the issue and are taking concrete steps to prevent a recurrence. **Root Cause** Twingate regularly releases software updates to our relay infrastructure, deploying them in stages across groups to minimize risk. As part of this update cycle, we also migrated our deployment tooling to a newer version of our infrastructure management system. The newer tooling applies stricter configuration validation rules than its predecessor. While preparing the update, our team identified and corrected a configuration conflict this stricter validation had surfaced. However, the fix was shipped in the same release bundle as a second change that was designed to serve as a prerequisite — specifically, a change that prevents the stricter validation from modifying existing, live deployments. The result was that a critical network configuration — the definition that tells our relay nodes to accept connections on port 443 — was silently removed from existing cluster deployments. **Why wasn't the issue detected earlier with the initial rollout?** Our continuous smoke testing did not include end-to-end coverage for Relay 443 specifically, so no automated alert fired when previous groups were deployed. Because those early upgrade groups carried relatively low Relay 443 traffic, no customer-visible impact surfaced either, leaving the issue undetected for approximately 48 hours until the higher-traffic groups were rolled out. **Why wasn't the issue caught in lower environments?** The issue was identified in a lower environment and a fix was prepared. However, the fix was deployed in the same release as the problematic change rather than as a prerequisite, which prevented it from taking full effect on existing cluster upgrades. Subsequent deployments confirmed the issue was fully resolved. **Remediation** Once we confirmed the root cause, we re-deployed the corrected relay configuration — explicitly restoring the port 443 host mapping — across all cluster groups in sequence. We validated each group before proceeding to the next. Full recovery was confirmed at 11:55 UTC on May 9. **Corrective actions** Short-term: * **Completed:** Redeployed to production to confirm the fix is stable and the issue will not recur. * We are adding regional smoke tests for Relay 443 to match the coverage already in place for other relay deployments, ensuring failures like this are caught before impacting customers.