Tyk incident

Control Plane Redis storage instability in us-east-1

Major Resolved View vendor source →

Tyk experienced a major incident on January 3, 2024 affecting Controller at aws-us-east-1, lasting 1d 3h. The incident has been resolved; the full update timeline is below.

Started
Jan 03, 2024, 12:37 PM UTC
Resolved
Jan 04, 2024, 04:14 PM UTC
Duration
1d 3h
Detected by Pingoru
Jan 03, 2024, 12:37 PM UTC

Affected components

Controller at aws-us-east-1

Update timeline

  1. investigating Jan 03, 2024, 12:37 PM UTC

    Our monitoring has alerted about an increase in storage-related errors in the Redis Clusters in the aws-us-east-1 zone. This has been caused by a rolling of the storage nodes, the mitigation procedure is being identified and put in action. Please avoid re-deploying or restarting deployments in this zone until the incident is resolved.

  2. identified Jan 03, 2024, 05:24 PM UTC

    The cause of the storage instability is identified and there's no risk of re-occurrence currently. Meanwhile, mitigation procedures for the remaining deployments to prevent future data loss are in progress.

  3. identified Jan 03, 2024, 10:55 PM UTC

    Initial mitigation is now complete. The SRE team will keep working to make sure every deployment has full redundancy. We will post an update once this is complete.

  4. resolved Jan 04, 2024, 04:14 PM UTC

    Functionality and normal redundancy is now restored.