Twingate experienced a minor incident on October 22, 2020 affecting Americas Relays, lasting 3h 55m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 22, 2020, 06:31 AM UTC
We are currently investigating the issue.
- monitoring Oct 22, 2020, 06:32 AM UTC
A fix has been implemented and we are monitoring the results.
- monitoring Oct 22, 2020, 06:33 AM UTC
We are monitoring as our Relay cluster is coming back online. Any affected Connectors that did not automatically reconnect may require a restart in order to resolve any connectivity issues.
- resolved Oct 22, 2020, 06:56 AM UTC
The affected Relay cluster is now fully operational.
- postmortem Jul 29, 2021, 06:39 PM UTC
**Components impacted** Relay Connector **Summary** On this date we had an outage during routine maintenance of our relay infrastructure. The issue started at 4am UTC and was resolved within 2 hours, requiring some customers to restart their connectors in order to re-establish connectivity to our relay infrastructure. **Root cause** In our investigation we determined that the connector received a malformed response from the relay cluster during its maintenance cycle. The malformed response in question contains the address of a particular relay node to which the connector is instructed to connect. This malformed response resulted in the connector retrying access to a non-existing relay node without failing over to another relay cluster. **Corrective actions** After correcting the specific issue that caused the malformed response, we modified both the relay and connector logic so that failover now happens automatically any time that a malformed response is received. We also modified our maintenance procedures to add additional health checks to prevent malformed responses. Finally, we took the opportunity to enhance how the failover logic works to incorporate multiple levels of relay redundancy in the connector's initial configuration that it receives after authentication.