INKY incident

Inky outbound mail delivery disrupted

Major Resolved View vendor source →

INKY experienced a major incident on December 16, 2021, lasting —. The incident has been resolved; the full update timeline is below.

Started
Dec 16, 2021, 03:57 PM UTC
Resolved
Dec 10, 2021, 02:33 PM UTC
Duration
Detected by Pingoru
Dec 16, 2021, 03:57 PM UTC

Update timeline

  1. resolved Dec 16, 2021, 03:57 PM UTC

    Outbound email sent to Inky servers during the outage period receive a reply of 530 status from Inky. This will result in either a rejected message NDR or a deferral dependent on the configuration of the connecting mail server. Clients who treated the message as a deferral had their messages delayed for the duration of the outage. Clients that determined the 530 response was a reject received back an NDR report for the message they attempted to send.

  2. postmortem Dec 16, 2021, 03:58 PM UTC

    Post incident report: Start: 10-December-2021 14:33 UTC End: 10-December-2021 14:52 UTC Duration: 19 minutes Summary: At 14:33 UTC Root Cause: A configuration that could not be parsed was loaded into the Inky outbound processing mail servers. This configuration change resulted in the processes stopping Customer Impact: Outbound email sent to Inky servers during the outage period receive a reply of 530 status from Inky. This will result in either a rejected message NDR or a deferral dependent on the configuration of the connecting mail server. Clients who treated the message as a deferral had their messages delayed for the duration of the outage. Clients that determined the 530 response was a reject received back an NDR report for the message they attempted to send. Mitigation Action: The configuration that resulted in the server failure was removed allowing all services to restart. Follow-up Items and Preventative Measures: 1. Inky Operations has change process of config pushes to first land on the Inky dev environment and test environment. 2. Inky Operations is changing the behavior of MTAs for outbound to ensure that a temporary disruption on processing systems results in a deferral message code being returned to upstream systems not a rejection code. 3. A patch has been created to change the load behavior of these configuration such that an invalid entry will create an alert and be skipped instead of resulting in a host not starting