Instaclustr incident

Increased Failure Rate of Apache Cassandra Backups

Notice Resolved View vendor source →

Instaclustr experienced a notice incident on April 17, 2025 affecting Management Console and Cluster Management API, lasting 4d 5h. The incident has been resolved; the full update timeline is below.

Started
Apr 17, 2025, 01:29 AM UTC
Resolved
Apr 21, 2025, 07:15 AM UTC
Duration
4d 5h
Detected by Pingoru
Apr 17, 2025, 01:29 AM UTC

Affected components

Management ConsoleCluster Management API

Update timeline

  1. investigating Apr 17, 2025, 01:29 AM UTC

    We are currently seeing an elevated rate of Backup Failures for AWS nodes for our Apache Cassandra offering. Currently we are expecting that these backups will continuously retry and eventually succeed, however this will be visible in the Instaclustr console and APIs as failed backup events. We are actively monitoring and working on a solution to this, and will provide more updates as investigation continues. If you have any questions or concerns please reach out via [email protected]

  2. identified Apr 17, 2025, 10:58 PM UTC

    The issue has been identified and we are preparing to rollout a fix.

  3. monitoring Apr 18, 2025, 12:01 AM UTC

    We have started rolling out a fix, this is expected to take a few hours to be applied to all Apache Cassandra nodes, we will be monitoring the progress and effectiveness of the rollout closely.

  4. monitoring Apr 18, 2025, 02:38 AM UTC

    A fix has been deployed to all Apache Cassandra nodes. Initial results indicate the problem has been resolved, however, we will continue to closely monitor. We expect to provide another update in the next 24 hours.

  5. monitoring Apr 18, 2025, 10:29 PM UTC

    We are continuing to observe a reduction in errors rates from Apache Cassandra backup events, however, we will continue to closely monitor. We expect to provide another update in the next 24 hours.

  6. monitoring Apr 19, 2025, 09:49 PM UTC

    We are continuing to observe a reduction in errors rates from Apache Cassandra backup events, however, we will continue to closely monitor. We expect to provide another update in the next 24 hours.

  7. monitoring Apr 20, 2025, 09:47 PM UTC

    We have seen a consistent reduction in errors rates from Apache Cassandra backup events over the past 3 days, however, we will continue to closely monitor. We expect to provide another update in the next 24 hours.

  8. resolved Apr 21, 2025, 07:15 AM UTC

    This incident has been resolved.