postmortem Mar 24, 2026, 06:00 PM UTC
## **Clarified Incident Summary \(Refined Draft\)** We’ve completed an initial review of the recent DNS disruption and wanted to share a brief update. During the event, **core backend DNS services** — including client assignment, policy calculation, cache, and internal resolution — **remained healthy and within normal operating thresholds**. There was no indication of a backend-wide outage or systemic DNS resolver failure. We did observe a **short-lived infrastructure event** in which compute instances were cycled as part of standard cloud lifecycle behavior. While this activity is expected and normally transparent, it appears to have intersected with **client-side behavior** in a limited set of scenarios. Based on current analysis, **impact was limited to a small subset of users**, primarily those running the **desktop DNS client**. In these cases, some clients may not have fully re-established state following upstream infrastructure changes, resulting in **intermittent DNS resolution failures despite backend availability**. To address this class of issue, an **upcoming desktop client release** includes improvements to client-side recovery logic — specifically around **session rehydration and resolver failover handling** — to ensure more reliable recovery during transient infrastructure events. In parallel, we are continuing work to **optimize backend failover behavior**, including reducing switching time and improving cross‑datacenter traffic re‑routing to further harden the platform during short-lived infrastructure transitions. A deeper review is ongoing to fully correlate **client behavior, infrastructure state, and network conditions** during the incident window. We will share additional findings as they become available. We appreciate your patience and take reliability very seriously. ## **Approximate Timeline \(High Confidence\)** > _Times are approximate and based on internal observations and message timestamps._ * **~12:45–1:00 PM ET** First internal reports of DNS resolution issues begin to surface, primarily affecting desktop clients. * **~1:00–1:20 PM ET** Engineering confirms backend DNS components are operational; infrastructure instance cycling observed during this window. * **~1:20–1:45 PM ET** Additional instances added as a precaution; backend traffic and resolver responses confirmed healthy. Focus shifts toward client-side behavior. * **~2:00 PM ET onward** Impact appears limited to a subset of desktop clients. Clients recover as state is re-established or DNS settings are reset. Deeper investigation initiated.