The Usual Tech Ramblings

Damn servers

Late Monday afternoon, we were notified by AT&T that our fiber connection was active, and they were ready to turn it over to us. Unfortunately because I’ve been snowed under with about 50 million projects, not much of an exaggeration honest, I hadn’t had time to rack mount the pix, and router. So late monday afternoon, I put the router up, and planned on doing the pix later that night. I arrive at the office at about 2100, and set to rearranging stuff to get the pix on. My other plan while there was to move all servers from one of the switches to a new 48 port switch, and leave internet connectivity on one switch, and everything else on the other two we had there. I didn’t think it’d take me more than about 45 minutes.

More horrors follow…

2 hours, 35 minutes later, I finally finish, updated a few configs, and started shutting down. My phone starts to alert me of an issue, and I start to ponder what I’d screwed up, so I started looking into it, only to find the current link had gone down. 25 minutes later, and a call to the tech center, the line came up on its own, so I went home.

Tuesday rolls around, and something isn’t right. The 48 port switch I’d dropped all the servers secondary NICs onto seemed a little “blink happy”. The activity lights were racing faster than I’ve seen before which hinted at a possible packet storm. I also got a report from the QA department that their test servers were down. Most odd, the changes I’d made shouldn’t have had that affect. So I start to take a look. I can ping the web box, I can ping the SQL box, but the web server is generating an error that suggests it cannot see the SQL server. A little odd, they’re on the same subnet, even on the server (vmware), and yet they cannot talk. So I logon to the servers, and sure enough, they cannot ping each other. So I reboot both, and they come back to life, for a while. This has been going on for the last day and a bit now, and I’m totally stumped.

So this afternoon, I decided that it must have something to do with what I’d been doing, so I reverted the switch configs back to a backup I took before I started (yes I’m s-m-r-t), moved all the cables back, after tyding them up a bit, and it seemed to have worked… for about an hour, and it broke again, much cursing and swearing has followed since. So I’m about to break the server, and pull out the redundant nic configuration, and go all off of a single nic, and see if that helps instead of an LACP controlled team. We shall see..

In the mean time, tonight is the third conference call this week held after midnight, of which I’m pretty sure I’m not going to bed before 3am again, and with all my other work chaoses right now, and many people asking for stuff to be done, I’m feeling a little “worn”.