Cadmium incident

EthosCE Service Disruption

Major Resolved View vendor source →

Cadmium experienced a major incident on November 24, 2024 affecting EthosCE, lasting 1d 6h. The incident has been resolved; the full update timeline is below.

Started
Nov 24, 2024, 02:14 PM UTC
Resolved
Nov 25, 2024, 08:35 PM UTC
Duration
1d 6h
Detected by Pingoru
Nov 24, 2024, 02:14 PM UTC

Affected components

EthosCE

Update timeline

  1. investigating Nov 24, 2024, 02:14 PM UTC

    Scheduled job for EthosCE are currently not running as expected. This includes scheduled emails, credit reporters, data warehouse updates and other tasks. We have identified the root cause of this issue and are working to restore this service as quickly as possible.

  2. investigating Nov 24, 2024, 06:57 PM UTC

    Our team is aware there is a service disruption and is working swiftly to identify the root cause.

  3. identified Nov 24, 2024, 10:12 PM UTC

    Our team is still working on restoring EthosCE service. We will provide an update as soon as we have more information.

  4. identified Nov 25, 2024, 01:31 AM UTC

    Our team is still working on restoring EthosCE service. We will provide an update as soon as we have more information.

  5. identified Nov 25, 2024, 04:53 AM UTC

    We are beginning the process of restoring customer sites to the hosting infrastructure. You can expect to see EthosCE sites coming back online overnight. However, please note that while your sites will be accessible, you may experience limitations with file uploads at this time. This functionality will be fully restored once the process is complete. We kindly ask that you refrain from opening any support tickets until we provide our next update that the restore is complete. Your patience and understanding during this time are greatly appreciated.

  6. monitoring Nov 25, 2024, 03:26 PM UTC

    We are actively working to restore normal file system access to EthosCE. Currently, users may experience issues with uploading and downloading files, as well as running reports, due to this disruption. We expect that most functionalities will be fully restored within the next few hours. We appreciate your patience during this process and will keep you updated on our progress.

  7. resolved Nov 25, 2024, 08:35 PM UTC

    We are happy to announce that all EthosCE services have been fully restored and are now operating normally. If you encounter any issues, please open a support ticket. We appreciate your patience throughout this incident, and a postmortem report will be shared here as soon as it is available.

  8. postmortem Nov 26, 2024, 09:42 PM UTC

    On November 23, 2024, at approximately 3:50 AM Eastern Time, the EthosCE platform experienced a service disruption due to a technical issue within our system infrastructure. The incident was fully resolved by November 25, 2024, at 3:35 PM Eastern Time. The disruption was caused by a corruption in the database which manages the state of the EthosCE hosting cluster. This corruption led to synchronization issues. While customer sites remained operational during the initial phase of the incident, all customer sites went offline during the recovery process due to these synchronization issues. We appreciate your understanding and patience during this incident. Our commitment to providing reliable service remains our top priority, and we are dedicated to learning from this experience to improve our services. If you have any questions or need further assistance, please do not hesitate to reach out through our support channels.