linkerd expired anchors led to cluster downtime
Timeline · 1 update
- resolved Jan 25, 2026, 03:45 PM UTC
From 4:40am to 7:40am PST on Jan 25, 2026, the cluster experienced downtime due to expired trust anchors. New certs have been generated.
Sapling had 9 outages in the last 2 years totaling 213h 13m of downtime — averaging 0.4 incidents per month.
There were 9 Sapling outages since September 27, 2025 totaling 213h 13m of downtime. Each is summarised below — incident details, duration, and resolution information.
From 4:40am to 7:40am PST on Jan 25, 2026, the cluster experienced downtime due to expired trust anchors. New certs have been generated.
We are currently patching this issue -- the extension will not work in Chrome v144
We are continuing to investigate this issue.
A fix has been implemented and we are monitoring the results.
We are continuing to monitor for any further issues.
This has been resolved for over 6 days with the new integration. We left the message so those with the older integration would update.
This incident has been resolved.
AWS cluster upgrades led to insufficient AI detection pod assignment, leading to latency and then downtime starting from 10:30am PST.
Limited resources in AWS region have led to insufficient AI detector pods since 11/5
This incident has been resolved.
We have seen a surge in spellcheck usage in the past day resulting in some scaling of the cluster components needed.
This incident has been resolved.
Subset of pods failing to connect to caching system after caching system upgrade led to degraded services
Delays when writing to S3 resulted in latency spikes across endpoints.