Update timeline
- resolved Mar 09, 2026, 09:45 PM UTC
Envoy pod q7fjq tripped its default ext_authz circuit breaker (max_connections: 1,024) after absorbing 57% of traffic due to a rollout without readiness probes, then getting hit by a GKE node removal reconnection storm. The circuit breaker locked for 17 minutes, returning HTTP 500 for ~62% of auth requests on that pod. 9.4% of all user-facing requests failed (5,203 of 55,449). The system self-recovered when the auth connection latency tail cleared naturally. The other two envoy pods were completely healthy throughout. A skynet-frontend restart at 17:41:15 was coincidental (old pods still alive when CB cleared at 17:42:00).
Looking to track Writer downtime and outages?
Pingoru polls Writer's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.
- Real-time alerts when Writer reports an incident
- Email, Slack, Discord, Microsoft Teams, and webhook notifications
- Track Writer alongside 5,000+ providers in one dashboard
- Component-level filtering
- Notification groups + maintenance calendar
5 free monitors · No credit card required