Last night I stumbled across a little issue I hadn’t noticed before. We’re in the process of deploying a new connection to our office for improved bandwidth, and reliability. Currently I have moved half of our servers to using the new line, and a handful are still using the old line. Last night about 0200, our main line (the old one) went offline. I was alerted to the problem by Nagios, which I have running on my personal server. I was a little baffled as to why the internal monitoring server wasn’t notifying me of any issues, seeing as it had been moved to the new line.
It turns out to be quite simple, because as soon as I got the line back up (at about 0300), the emails poured out of the server. I managed to catch one of the log messages. It basically said:
Looking at the resolv.conf file for the server, it pointed to our two internal domain controllers, both of which run DNS services… So why did it fail? Simple. They were using the old, down, line. This meant they couldn’t find the root servers, and in turn were not able to resolve the domain name, or find the MX records for my mail services.
This needed resolving (no pun intended). It defeats the whole point of email notifications, if the mail cannot get out. So I decided to drop Bind on the server. This is where I hit a stumbling block. I didn’t need the server to do complete resolving, just needed it to do resolution if it couldn’t get an answer from the other two servers. I believe you’re supposed to be able to put multiple servers in the
/etc/resolv.conf file, but I’ve never had any luck past 3 servers, so I didn’t want to run that risk. So Bind it was. I then had to figure out how to make it do resolution most of the time based on data from the two DCs. This turned out to be easier than it seemed. I stumbled across this. It’s basically a list of the configuration items (which oddly enough seems to be missing from the bind documentation online). Two were very important,
forwarders. These two options, combined, solved my issues. My named.conf now has the following in the
options section at the top:
1 2 3 4 5 6
This does two things.
forwarders tells BIND to look at those servers for resolution.
forward first tells BIND to try the
forwarders first, if they fail, try looking finding the host information itself. This then reverts back to using the root servers.
I’m waiting for my next workable maintenance window to test my theory out, which will be when I get back from my vacation, to test how effective it really is.