August 15, 2012-The ATT DNS outage demonstrates the importance of real-time root cause diagnostics when monitoring Internet services. Intermittent ATT DNS errors were first detected at 5:23 AM PST by Dotcom-Monitor a full hour before AT&T reported the issue. The Dotcom-Monitor Minnesota node noted the issue and captured a diagnostic DNS trace at the time of error. Non-clients of Dotcom-Monitor can use a free DNS trace tool here to test if their domain is affected by selecting Trace Style “DNS”.
This piece of info was sent immediately to Dotcom-Monitor clients whose services were affected by the ATT DNS outage. This diagnostic gave Dotcom-Monitor clients immediate info that pinpointed the root cause of the issue without the need for additional troubleshooting. Dotcom-Monitor clients using ATT DNS made extremely fast, informed decisions, such as moving their DNS to another provider, or taking alternative measures to re-route traffic.
The DNS trace taken at the time of the 5:23 am PST clearly shows AT&T servers timing out to DNS query requests.
- A.ROOT-SERVERS.NET [198.41.0.4]: Type=NS [time 62 ms]
- L.GTLD-SERVERS.NET [192.41.162.30]: Type=NS [time 31 ms]
- cmtu.mt.ns.els-gms.att.net [12.127.16.69]: Type=NS [time 17628 ms] error Receive timeout.
- cbru.br.ns.els-gms.att.net [199.191.128.105]: Type=NS [time 17628 ms] error Receive timeout.
- A.ROOT-SERVERS.NET [198.41.0.4]: Type=NS [time 62 ms]
- E.GTLD-SERVERS.NET [192.12.94.30]: Type=NS [time 109 ms]
- cmtu.mt.ns.els-gms.att.net [12.127.16.69]: Type=NS [time 17628 ms] error Receive timeout.
- cbru.br.ns.els-gms.att.net [199.191.128.105]: Type=NS [time 17628 ms] error Receive timeout.
Trace complete.
These two bolded AT&T secondary DNS servers show the time out issue. ATT DNS server info based on: https://dpt.ip.att.net/dpt_helphome/dns_seczones.htm
The ATT DNS outage highlights the importance of not caching DNS while monitoring. Dotcom-Monitor uses a non-cache model for monitoring which means each instance of monitoring is “fresh” and completes a full DNS propagation. In many cases, monitoring services that cache DNS will not detect DNS outages, like the ATT DNS error. Moreover, Dotcom-Monitor also conducts automatic traceroutes at the time of a DNS error in order to uncover the specifics of DNS errors. This type of automatic DNS diagnostic is critical for pinpointing the DNS outage and speeding up the time-to-repair, which reduces the cost of DNS outage downtime.