Networking


Many are the times we’ve run into DNS configuration problems with Microsoft AD.  After being asked for advice a few more times than normal this year, I’ve pulled together several emails for this list of “Troubleshooting Microsoft AD-integrated DNS” highlights below.  We’ll first cover the generic topics of checking the configuration of your server configuration,  then the configuration of the zones themselves. For each topic, we’ll do a checklist followed by an explanation.

Server configuration:

Checklist

  1. Is the server (Windows 2003 or higher) pointing to itself for primary DNS in the network configuration?
  2. If a standalone DC: Does the server have *no* secondary DNS in the network configuration?
  3. If there are multiple DCs: Does the server list only other DCs in the secondary DNS server list in the advanced network configuration?
  4. Does the server have proper forwarders in the DNS server configuration (to the parent domain or to the ISP, but not both)?
  5. In a command prompt, run the following:
    ipconfig /registerdns
    net stop netlogon
    net start netlogon
  6. Read DNS and System logs to make sure there are no issues being reported.
  7. wait 20 minutes

Explanation

One of the major problems we run into is that customers will put the ISP DNS servers in the network configuration on the DC, not in the DNS Forwarders list in the DNS Server configuration.  The DC *is* a DNS server.  It needs to talk to itself, so that it can register crucial DNS settings in its own database.  If its own database can’t find the information requested (such as www.google.com), then the DNS Server service is responsible for looking that data up, and then caching it so that it’s readily available for other clients, too.  This misconfiguration also has the problem of generating DDNS update requests back to the ISP DNS servers, which are ignored at best, and a security leak at worst (like for military/government installations).

I like to tell my Unix customers “the first rule of administering Active Directory is to go get another cup of coffee.” This forces them to take their hands off the keyboard and wait for cross-site replication (hopefully) before making another change.  It’s a good reminder for the seasoned Windows admins, as well.

Zone Configuration

Reverse Lookup Zones

We’ll cover reverse lookup zones before forward lookup zones, for two reasons: 1) customers screw up reverse lookup configuration much more often than forward lookup configuration ; 2) no SRV records in Reverse zones (normally).

Checklist

If you have non-Microsoft DNS servers or multiple AD domains in your environment

  1. Does the server have reverse DNS zones defined?
  2. Does any *other* server (in the DNS Forwarders configuration list) have the same reverse DNS zone defined?
  3. Do the defined reverse zones allow “unsecured dynamic updates”?
  4. Are all IP subnets in your network defined as reverse DNS zones on the primary DNS servers (the last forwarders in the network before the ISP)?
  5. Do you have aging and scavenging turned on in the server settings?  If so (you should), do you have all clients automatically renewing their records (Windows clients will by default)?

If you only have a single AD domain, or no non-Microsoft DNS servers

  1. Does the server have reverse DNS zones defined for all IP subnets (including IPv6) in your network?
  2. Do those reverse DNS zones allow dynamic updates?
  3. Is aging of old records enabled with sane no-refresh and refresh values  in the reverse zones?

Explanation

Each DNS Zone is a database.  There can only be one authoritative owner of the database, defined by the SOA record on the Zone.  Any other DNS servers get their information from this SOA, either by normal queries, or by zone transfer (AD replication does a kind of zone transfer).  If two servers are set up with the same zone (create 0.168.192.in-addr.arpa reverse DNS zone in dns1.contoso.com and ns1.worldwidetoys.com, for example), then there is no mechanism to transfer the information between those two servers.

For example: any individual client will only talk to the DNS server it’s configured to talk to (client1.contoso.com gets its DNS info from dns1.contoso.com and winxp1.worldwidetoys.com gets its information from ns1.worldwidetoys.com). Each client will also send updates only to its own DNS server.  This means that client1.contoso.com will register its IP 192.168.0.10 with dns1.contoso.com, and winxp1.worldwidetoys.com will register its IP 192.168.0.20 with ns1.worldwidetoys.com.  These two records will never be synched between dns1.contoso.com and ns1.worldwidetoys.com.  Therefore, when winxp1.worldwidetoys.com asks ns1.worldwidetoys.com “who has 192.168.0.10?”, ns1.worldwidetoys.com will answer “nobody!”.

The DNS admin must fix this problem by manually registering all of the records from ns1.worldwidetoys.com in the zone stored in dns1.contoso.com, deleting the 0.168.192.in-addr.arpa zone from ns1.worldwidetoys.com, and then setting up a forwarder or conditional forwarder to dns1.contoso.com.  Now, that same query results in ns1.worldwidetoys.com looking in its own database, finding no answer, and reaching out to its forwarders to ask, “who has 192.168.0.10?”.  Similarly, when winxp1.worldwidetoys.com goes to register 192.168.0.20, it is directed, via the SOA record, to send that registration to dns1.contoso.com.  This is why reverse zones often need to allow unsecured dynamic updates.

Forward Lookup Zones

I have a customer who needs this much data now – I’ll follow up with the Forward Lookup zones in a separate post later this week.

I know it’s probably an unusual situation, but in the lab we have Jumbo frames turned on for all the servers and test boxes. It makes a huge difference copying ISOs between hosts, and doing network backups. However, my Kubuntu laptop isn’t always in the lab network. This means that I almost never remember to change the MTU when I’m back in the office, OR I remember in the middle of a transfer, when it’s already too late to gain the benefits.

So I wrote a little script, and put it in /etc/network/if-up.d/ [edit: under NetworkManager, use /etc/NetworkManager/dispatcher.d isntead] named “jumbo-frames.sh”. The if-*.d/ structure is designed for exactly this purpose: run a script when an interface comes up. The basic premise is: If I’m plugged into a wired network (eth0) in the lab (domain or IP address match certain parameters), then set the MTU to 9000 (jumbo frame support), otherwise assume the network has a normal MTU (1500). This allows the system to reconfigure on the fly if I put it to sleep and go visit a customer.

Here’s the code:

#!/bin/sh
# Set support for jumbo frames when at home on wired network, else do not.
# Determine home network based on IP address and DNS-determined name.
# $IFACE should be set by the caller.

PATH=/sbin:/bin:/usr/sbin:/usr/bin

IFC=/sbin/ifconfig
INT=”eth0″
MTU=9000
DEFMTU=1500
#name of the DNS domain to assume as “home”
HOMED=”totalnetsolutions.net”
#IP Subnet to assume as “home” if DNS test fails
HOMEN=”10.0.0.”

test -x $IFC || exit 0

# Don’t make changes to the wireless (wlan) or loopback (lo) interfaces
if [ “$IFACE” != “$INT” ]; then
exit 0
fi

# if dhcpd is still working on writing our resolv.conf, just wait a while (it’s a hack, but it works).
test -f /etc/resolv.conf || sleep 15

DOM=`awk ‘/search/ { print $2 }’ /etc/resolv.conf`
NET=`ip addr show dev $IFACE | awk ‘/inet / { print $2 }’ | awk -F. ‘{ print $1 “.” $2 “.” $3 “.” }’`

if [ “$DOM” = “$HOMED” ]; then
$IFC $IFACE mtu $MTU
elif [ “$NET” = “$HOMEN” ]; then
$IFC $IFACE mtu $MTU
else
$IFC $IFACE mtu $DEFMTU
fi

We’ve been having some server uptime/stability issues, and aren’t getting alerts from HP Systems Insight Manager (HP SIM) that the services are down (cause they’re not, they’re just not answering on HTTP).  So I took a copy of “responder.pl” and put it into something I wrote for totalnetsolutions.net.  What came out is actually pretty nice, easily configurable, and so far this week, very stable.

We haev this running ever 3 minutes from 3 systems: 1 Windows 2003, 1 Fedora Core 8, and 1 Kubuntu Gutsy Gibbon.  Requires Net::SMTP, Config::INIFiles, LWP::UserAgent, and HTTP::Request.  The only one that I’ve needed to download and install is Config::INIFiles on any of those 3 systems.  But I do have LWP::Simple on all systems, so I’m not sure if you’ll need the last 2.  This is my first published code other than 3 line bash scripts, so be kind in comments.

Feel free to take and use / improve / update this – I’d just appreciate if you’d let me know so I can update this version here.  The parseIni() function checks that all “URL”s are in http://www.google.com format or http://64.233.167.99 format (it checks for http:// followed by text followed by what appears to be a valid TLD format, or it checks for http:// followed by an IP address).  I have yet to add in the regex to look for a valid full URI, because I didn’t need that yet.

This is upgraded over responser.pl in that:

  1. It will send to any number of SMTP recipients (comma-separated)
  2. It will silence its alerting if *all* checked addresses are down.  If the monitoring system gets unplugged from the network, it won’t attempt to send hundreds of alerts upon regaining access.  Or if you’re testing from a DSL line, you won’t get alerts because the DSL line went down, but the actual target was up.  The next version will have this as an option in the INI file.
  3. It uses standard INI file formatting, rather than a parsed text file.
  4. it runs out of the box (so to speak) on Windows (ActivePerl) or Linux (Fedora and Ubuntu both tested).
  5. It has better inline documentation.

The major problem is that a minimum of 2 URLs are needed in the INI file for the full logic to work.  You can get around this for small networks by adding in the DNS domain for one, and the IP address for the other. 

Thanks, and please share any concerns or problems.

chk-web.pl

I’ve been very busy with clients over the past 2 weeks, troubleshooting Clustering problems, Exchange issues, and planning a new trust relationship, on top of normal maintenance and design. As I solve each issue, I’ll be posting what I can about them. This week we were able to solve the odd clustering problem…

We’ve seen some issues over the past approximately 2 months, particularly with MS SQL 2000 clusters (1 Exchange 2003 cluster), where the cluster group fails on one node, and the other node (or nodes) fails to pick up the group, leaving the complete cluster group offline. In each of the cases (on both HP and Dell hardware) the first striking piece of evidence in the logs is that all nodes that fail to bring up the cluster report that the Cluster IP Address resource couldn’t be brought online, because of an IP address conflict on the network

Making this issue particularly fun is that most of the information we used to solve the problem, is a lack of information.  In particular, there is absolutely nothing interesting at all in any nodes’ cluster.log file.You see the disks negotiate from node to node, but nothing that makes the failover look any different than if you had right-clicked the group and chosen “Move Group” from Cluster Administrator.

What starts the problem off is Event ID 1228 from source “ClusNet”, which says that the “ClusNet driver couldn’t communicate with the ClusSvc for 60 seconds, the Cluster service is being terminated.” Most of the time, you might even miss that this event is there, because it causes so many Event Source Tcpip, ID 4199; Source ftdisk, ID 57; and Source ntfs event ID 50 events, that it’s easy to look over 1 little error. Especially when monitoring systems like Microsoft Operations Manager (MOM), or Idera SQLDiagnostics Manager (SQLDiag) or HP Systems Insight Manager (SIM) all report the cluster as having issues 30-60 seconds after the CluNet 1228 event is written (timing which corresponds exactly to the Tcpip 4199 events (IP address conflict) or the ftdisk 57 events (failed to flush transaction data). So, here’s what happens, based on conversations with Microsoft, training with Microsoft and HP, and a LOT of reading. (more…)

First, reference back to my first post on Domain Controller IP/Subnet changes. The nice thing about changing IP addresses on DCs in a larger environment, is that it’s actually easier. I have to keep this one quick for now, but will expand based on comments, which you all seem pretty good at leaving (and thank you!). Please, PLEASE refer back to the first post – this one is only an expansion on that one.

  1. Same as before: why are you changing IPs? In larger environments, I do this because of a physical move of just one site. If the networking team doesn’t have the new subnet up and routing, don’t start!
  2. Make sure the new site (if required) is set up in AD. If I’m moving DCs from one physical location to another, I will build a new site, rather than re-using the old one, because the new site often has better connectivity, so the site link costs are changing.
  3. Add the new IP to the DC you’re moving (DC01 for this). Same as before: don’t remove the old one, just add the new.
  4. On DC01, do the following to verify registration worked:
    ipconfig /registerdns
    Wait a few minutes.
    nslookup
    server DC01
    set type=A
    DC01.foobar.local
    foobar.local
    server DC02
    DC01.foobar.local
    foobar.local

    The answers from DC01 and DC02 should be the same, with possibly different orders. The important thing is that the new IP address and the old IP address show up for both queries on both servers.
  5. Shut down DC01, pack it up and move it. (Or just plug it into the new network.)
  6. Boot up, verify that DC01 has network connectivity, and that other systems can see that it has the new IP.
  7. If you haven’t, make the new IP primary (change order in Network settings), make sure the DNS and WINS servers are correct and reachable (Remember that Windows 2003 DNS should point to itself).
  8. Once verifying that AD is replicating across sites properly (up to 15 minutes in my experience), remove the old IP, ipconfig /registerdns, and reboot.
  9. When it comes back up re-verify that AD is still replicating, and you should be set.

I would point out that when doing a change this big to your environment, reviewing your AD replication, DNS forwarding, and WINS topology is a good idea.

Next Page »