Cadmium incident

EthosCE Service Disruption

Critical Resolved View vendor source →

Cadmium experienced a critical incident on April 18, 2024 affecting EthosCE, lasting 17h 33m. The incident has been resolved; the full update timeline is below.

Started
Apr 18, 2024, 06:30 PM UTC
Resolved
Apr 19, 2024, 12:04 PM UTC
Duration
17h 33m
Detected by Pingoru
Apr 18, 2024, 06:30 PM UTC

Affected components

EthosCE

Update timeline

  1. investigating Apr 18, 2024, 06:30 PM UTC

    Our team is aware there is a service disruption and is working swiftly to identify the root cause.

  2. monitoring Apr 18, 2024, 06:40 PM UTC

    A fix has been implemented and we are monitoring the results.

  3. resolved Apr 19, 2024, 12:04 PM UTC

    This incident has been resolved.

  4. postmortem Apr 29, 2024, 08:57 PM UTC

    During this outage, the router pods responsible for directing traffic from the internet to customer EthosCE sites failed to normally route traffic. The system attempted to automatically restart the router pod but the restarts did not succeed and eventually “backed off” in order to avoid a loop condition. An engineer manually deleted the router pods, which respawned and the system returned to normal operations.