Intermittent DNS Resolution Issues Affecting a Subset of Desktop Clients

Incident Report for CyberFOX

Postmortem

Clarified Incident Summary (Refined Draft)

We’ve completed an initial review of the recent DNS disruption and wanted to share a brief update.

During the event, core backend DNS services — including client assignment, policy calculation, cache, and internal resolution — remained healthy and within normal operating thresholds. There was no indication of a backend-wide outage or systemic DNS resolver failure.

We did observe a short-lived infrastructure event in which compute instances were cycled as part of standard cloud lifecycle behavior. While this activity is expected and normally transparent, it appears to have intersected with client-side behavior in a limited set of scenarios.

Based on current analysis, impact was limited to a small subset of users, primarily those running the desktop DNS client. In these cases, some clients may not have fully re-established state following upstream infrastructure changes, resulting in intermittent DNS resolution failures despite backend availability.

To address this class of issue, an upcoming desktop client release includes improvements to client-side recovery logic — specifically around session rehydration and resolver failover handling — to ensure more reliable recovery during transient infrastructure events.

In parallel, we are continuing work to optimize backend failover behavior, including reducing switching time and improving cross‑datacenter traffic re‑routing to further harden the platform during short-lived infrastructure transitions.

A deeper review is ongoing to fully correlate client behavior, infrastructure state, and network conditions during the incident window. We will share additional findings as they become available.

We appreciate your patience and take reliability very seriously.

Approximate Timeline (High Confidence)

Times are approximate and based on internal observations and message timestamps.

  • ~12:45–1:00 PM ET
    First internal reports of DNS resolution issues begin to surface, primarily affecting desktop clients.
  • ~1:00–1:20 PM ET
    Engineering confirms backend DNS components are operational; infrastructure instance cycling observed during this window.
  • ~1:20–1:45 PM ET
    Additional instances added as a precaution; backend traffic and resolver responses confirmed healthy. Focus shifts toward client-side behavior.
  • ~2:00 PM ET onward
    Impact appears limited to a subset of desktop clients. Clients recover as state is re-established or DNS settings are reset. Deeper investigation initiated.
Posted Mar 24, 2026 - 14:02 EDT

Resolved

Update on Brief DNS Disruption from 3/23/26 from ~12:45 - 1:15 PM

We wanted to share a brief update regarding a short‑lived DNS issue that occurred on Monday, 3/23/26.

During this time, our core DNS platform remained online and fully operational. We observed a routine infrastructure event in our cloud environment. While this activity is normally transparent, a small number of desktop clients did not immediately reconnect as expected.

As a result, a limited subset of users experienced temporary DNS resolution issues. This was not a system‑wide outage, and the impact was isolated.

The issue was resolved as affected clients re‑established connectivity, and no backend services required intervention.

To further reduce the likelihood of this scenario in the future:
* We have client‑side improvements coming in an upcoming desktop client release to enhance automatic recovery.
* We are continuing to strengthen backend failover and routing behavior during brief infrastructure transitions.

We understand how critical DNS reliability is and appreciate your patience. Please don’t hesitate to reach out if you have questions or would like to discuss this further with your customers.
Posted Mar 23, 2026 - 13:00 EDT