Clockwork incident

Redis Feature Flag Error

Minor Resolved View vendor source →

Clockwork experienced a minor incident on October 20, 2022 affecting Clockwork Production Application, lasting 56m. The incident has been resolved; the full update timeline is below.

Started
Oct 20, 2022, 04:09 PM UTC
Resolved
Oct 20, 2022, 05:05 PM UTC
Duration
56m
Detected by Pingoru
Oct 20, 2022, 04:09 PM UTC

Affected components

Clockwork Production Application

Update timeline

  1. investigating Oct 20, 2022, 04:09 PM UTC

    We’re currently experiencing a service disruption. Our dev ops team is working to identify the root cause and implement a solution. Users may be experiencing errors in page views and feature flags.

  2. identified Oct 20, 2022, 04:56 PM UTC

    Issue has been identified. Redis feature flags were removed; Restoring from backup. Restoration should be complete within no more than 30 minutes

  3. resolved Oct 20, 2022, 05:05 PM UTC

    Feature flags have been restored. All page views are rendered as expected.

  4. postmortem Oct 21, 2022, 05:28 PM UTC

    _On the morning of October 20 Clockwork views and application feature settings reverted to a non setting state for approximately 2 hours. This is the first time we’ve had any feature flag reversion. We immediately discovered the issue and remediated it as quickly as possible. We understand this was a significant inconvenience that we take very seriously. Below we have shared our postmortem and our actions to prevent this from occurring in the future._ * **Incident Details** All the feature flags stored in Redis were wiped away causing users to see old Clockwork views and hence users were not able to find certain functionalities. * **Investigation Summary** The team reviewed the code and the Redis Server configuration. Identified that the same Redis instance is shared between Demo and Production environments. The team worked on restoring the data from backup Redis servers. * **Data Exposure Summary** No, data was exposed. UI settings data stored in Redis was deleted, but has been recovered from the backup. * **Remediation Summary** Review Feature Flagging System, use separate Redis namespace for Demo. Upgrade to Redis 6.x and use of better security features and ACL, current version is 5.0. Setup Redis cluster with multi-AZ replication.