TheGeekery

The Usual Tech Ramblings

DL585, Hardware Monitors, and Progress

A couple of weeks ago, I tweeted about a server that had a blinking health light. After some poking around, I discovered that WBEM was reporting everything was A-OK with the server, but flipping the management tools to using SNMP reported a memory failure.

After scheduling some down time to replace the memory, I went to slide the server out on the rails, and discovered that the HP ProLiant DL585 G1 series servers has the health status board on the top of the chassis. This confused me because all the other servers we have have either a pop out tray with the information on, or some kind of LCD up from that reports the status.

What caught me by surprise was the fact that the server had the full hardware layout instructions on the top of the server. It’s been a long time since I’ve seen this. Granted it is an old server (G1 class servers are getting a little long in the tooth), but it was nice to see the full details right where I needed them, and not having to go hunting through HPs terrible support site for the details.

What’s even more interesting to me was the hardware health status board. On the newer servers, this is usually smaller than a credit card, and about as thick as 3 quarters stacked on top of each other. In this DL585, the board is the size of a large network card, with multiple LEDs feeding clear plastic shafts that pipe to another set on the bottom of the chassis top.

I find it interesting to see how hardware configurations have grown, and changed over time, even the very small things. I don’t see many servers with instructions on any more, probably a money saver for the vendors. I also don’t see much in the way of big health boards any more, managing to squeeze so much into a tiny board now. The advancement in chips and technologies has made the server realm quite interesting to poke around in.

Move-VM and Explicit Destination

Due to a weird BIOS error, most of our ESX hosts have thrown a memory warning. This is a known issue, and HP want you to update the BIOS before doing any further troubleshooting, so I scheduled a change window to upgrade all the hosts in our cluster (12 in this cluster). While working on the upgrades, I stumbled across a weird issue with Move-VM.

Our clusters run Distributed Resource Scheduler (DRS), which allows the cluster to migrate guests seamlessly between the hosts when resources become constrained. One of the handy features is when you put a host into maintenance mode, it automatically moves all the guests off of that host. However, because I wanted to make sure all the guests migrated, and manage the alerts in our monitoring system, I threw together a quick script using PowerCLI so I could move all the dev/test/stage machines (lower monitoring thresholds), and then move the production servers after so I could see which boxes I had to mark as “unmanaged” in our monitoring software.

Moving the VMs is really easy with PowerCLI, for example, here is the basics of my script:

$creds = Get-Credentials
$conn = Connect-ViServer -Credentials $creds vcenter.domain.tld
Move-VM -Destination newhost -Name guestname

Pretty simple, however I was noticing an oddity. When the guests were moving, they were losing the resource pools they were part of. Looking at the documentation for Move-VM, the destination supports a folder, cluster, resource pool, or host. Because I wasn’t explicitly specifying the type as a host, it looks like all the other options were being set to null and the guest moved. So for the next batch of guests to move, I explicitly defined the type:

$creds = Get-Credentials
$conn = Connect-ViServer -Credentials $creds vcenter.domain.tld
$dst_host = Get-VMHost -Name newhost
Move-VM -Destination $dst_host -Name guestname

After explicitly defining the type of destination, the guest moved host, but retained it’s resource pool allocations. Much better! Obviously this only moves one guest, and each of our hosts has quite a few guests on it, so I used a Get-VM combination, some pipes, and such.

$creds = Get-Credentials
$conn = Connect-ViServer -Credentials $creds vcenter.domain.tld
$dst_host = Get-VMHost -Name newhost

Get-VM -Location (Get-VMHost -Name oldhost) | ?{ $_.Name -match "DAL[DST]+.*" } | %{
	Move-VM -Destination $dst_host -Name $_
}

The above code happily moves all our dev, stage, and test, machines off of oldhost onto newhost. From there, it was a case of finding the hosts in the monitoring software, unmanaging them all, and repeating the same code without the conditional name check.

So lesson learned, if a function accepts multiple input types, always explicitly define the type, as you cannot tell what might happen.

Downloading Files with PowerShell

While messing around trying to diagnose an issue with IIS and compression yesterday I had a need to download a whole bunch of files all at once (or at least in quick succession).

Calling in the libraries from the .NET framework; this task is actually really easy.

$src = "http://localhost/test.cmp"
$dst = "F:\Downloads\junk\test_{0}.cmp"
$web = New-Object System.Net.WebClient

$web.Headers.Add([System.Net.HttpRequestHeader]::AcceptEncoding, "gzip")

1..100 | %{
	$web.DownloadFile($src, $dst -f $_ )
}

The script is relatively self-explanatory. $src and $dst are the source and destination files. For the destination I’ve used a formatted string allowing me to inject values into the string using C# style formatting. I create a new object using the System.Net.WebClient type. I then loop 100 times and download the same file, saving it to a different destination name each time.

The above code includes setting an “Accept-Encoding” header to the request. Because I was testing a gzip compression issue on IIS I needed this header, otherwise WebClient just requests the raw data. The caveat to this is that the file being written out to the $dst path is actually a gzip compressed file and not the actual data. This is fine for my testing because it made it quick and easy to see if compression had worked. I was downloading a file containing 300KB of Lorem Ipsum text. If it the IIS compression worked the file would be much smaller. I’ll do another blog post soon about handling the gzip data and turning it back to the same as the source file.

I could have done the same using the BITS service, but BITS would probably have fixed the gzip’d file so I’d have to do more work to determine if compression had actually worked (examining logs, network trace, IIS trace, etc). This worked out nicely.

As a side note, the “1..100” code is a shortcut for generating a sequence of numbers from 1 to 100. If you opened a powershell prompt, typed “1..100” and hit enter it’d spew out all the digits from 1 through to 100. Passing this on to foreach-object we can turn it into a loop. The other way to do this would be to use a for, do, or while loop.

for( $i=0; $i -lt 100; $i++) {
  $web.DownloadFile($src, $dst -f $i)
}

$i=0
do {
  $web.DownloadFile($src,$dst -f $i)
  $i++
} until ($i -eq 100)

$i = 0
while ($i -lt 100) {
  $web.DownloadFile($src, $dst -f $i)
  $i++
}

DL365 G5 battery location

Sometimes it’s hard to understand hardware design (no pun intended). I can understand there are complex requirements, depending on what’s being designed. Today I stumbled on what I’d call a pretty odd design flaw, but can see why it was done with poor foresight.

In this case, I’m referring to the design of HPs DL365 G5 servers. Particularly the decision for the location of the storage controller’s battery. I understand that 1U servers have some very specific, and complex requirements, such as:

  • Must fit all the stuff in a small space
  • Must have good air flow
  • Must provide easy access to hot-swap parts

Earlier this month we got an alarm on a server that the battery had gone bad, and was no longer holding a charge. I scheduled downtime, made sure we had a battery on hand, put in a change ticket, and set to work.

DL365 Fan After powering down the server, and removing the top cover, I set out to find the battery. The storage controller was next to the memory banks, and had 3 cables coming from it. 2 were obviously for the HDD back planes, the third, a little thinner, disappeared behind some fans, and into the front of the case.

DL365 Fan If you take a look at the picture to the right, and look above the center fan bank, you can see a thin black cable disappearing under the chassis. A closer look is in this picture. From the front, it’s hiding right above the middle drive behind this grill.

To remove the fan bank is a project on its own. Dell uses nice quick release containers, squeeze some colored tabs, and stuff just pops apart. To get these out, I had to use a flathead screwdriver to pry the metal tabs out the way. Well, they’re not really tabs. They were about 2 inches long, and about the same deep, and not flexible in the slightest.

DL365 Storage Controller Batter Once removed, it should be a simple case of pulling on a small black tab. This is where the design flaw comes in (as if the fact its hidden in there to begin with isn’t bad enough). These storage batteries have a tendency to swell as they get older, even more so as they reach the end of their life. If you look at the picture to the left, you can see the bulge already. Why is this a problem? Well 2 reasons, the first being fairly obvious… If it’s bulging, there is a chance for it to explode, or leak, all over the inside of the server, in particular onto the hard drives below it. The second, is the design flaw, the battery is nicely wedged between 2 sheets of rivited metal, and when the battery swells, it does NOT want to come out.

It can take some ‘gentle’ pursuasion to evict the battery from it’s sanctuary without puncturing the first cell on the little board. In this case, I had to use a flat head screwdriver, and go through the grill at the front of the chassis above the drives to push it, while I carefully pulled from the other side at the same time. There is a few more pictures of the swollen battery here, but this one shows it pretty well.

The battery was probably put in that location because it was the only free space left. Little consideration was taken for maintenance, or hardware faults, such as swelling batteries. I believe the G7 series has this issue resolved though.

Still Alive...

It has been a while since I’ve posted. The last was on my string vs bash match up. It got a fair bit of attention, with some great feedback, so I learnt some new stuff. I’m going to try getting back to posting more regular posts, even if they aren’t always techy related1. For now, here’s what has been going on…

  • New job
  • More volunteer work with CERT
  • More volunteer work at school

Generally as busy as before, but at the same time, a different kind of busy.

Job

The transition to a new job was probably one of the harder decisions I’ve had to make in a long time, but now I’ve settled in, I don’t regret it at all. I’m getting my hands on a lot more stuff I didn’t have access to before, as well as taking lead on projects that were originally left to HQ folk.

The environment is a lot larger. I’ve gone from a handful of few racks with gear to 20+ full racks, not counting the volume of VM guests we have running.

Documentation is pretty limited, so I’m working on updating it all, starting with a full data center audit. I’m also working on a Lync pilot, as well as updating monitoring and other projects.

CERT

I’ll have a complete post on this one later. In the meantime, you can read Christopher Webber’s post on CERT and Sys Admins.

With tornadoes striking the DFW metroplex recently, we were asked to help with some cleanup operations. Post and pictures on that later.

So lots of new exciting stuff floating around, learning new stuff, getting my fingers into new toys and such. I’ll post more stuff as it comes up.

  1. I do have some in the works on CERT and Emergency management. 

PowerShell vs Bash... String processing face off

While working on a file cleanup project today, I had to work with a text file containing close to 2 million lines. I had to extract file names with a specific string within it. I figured I could do it relatively quickly in PowerShell, but also realized it’d be a good opportunity to flex my Bash skills. To say the results were interesting is an understatement (to me at least). I picked up some tuning tricks while tinkering, which made some massive improvements.

eclipse, TFS Everywhere, and new files

I work in a mostly windows environment, until relatively recently, where we acquired a new company. The new company was mostly Mac, so I acquired myself a Mac Air. With that, I decided to see how much of my Windows life I could transport over there. I use Microsoft’s Team Foundation Server for managing source control, so I had to find a good alternative to trying to get Visual Studio working on a Mac platform. Fortunately back in 2009, Microsoft acquired a company called Teamprise, which developer a client that worked on a variety of platforms. Microsoft re-released that as TFS Anywhere. I use TFS Anywhere to edit a lot of the files in TFS on my Mac, via Eclipse.

A few weeks ago I stumbled on an odd issue where Eclipse wasn’t keeping in-sync with what TFS Anywhere was doing. For example, we keep our Nagios configs checked into TFS for version control. I was using the Mac, and had added several new contact files. When I was back using my Windows machine, I told Eclipse to trigger the “Get Latest” function on TFS Anywhere, which it dutifully did, then threw an error…

The item $/infrastructure/Nagios/etc/objects/contacts/newcontact.cfg already exists

It repeated this error for each of the new contacts I had added. What appeared to be happening is TFS Anywhere was getting the files, notifying Eclipse the files were there, which triggered the project to add them to the file list, which then in turn triggered Eclipse to try an add them to TFS. A nice little circle going there. I thought maybe it was just a quirk and closed the project. When I reopened the project, the same error occurred again, this time I didn’t even tell Eclipse to do any updates, this occurred just on opening the project.

After some skimming around, I discovered this is likely a bug in TFS Anywhere, and the simple solution is to tell Eclipse to update the server information. This is done using the “Team” context menu (right click the project), then selecting “Refresh Server Information”. Here is the reference post I found the details on.

PowerShell and SQLite

A while back I mentioned I was using SQLite with PowerShell. I was doing this because I had to access the database for gPodder to tweak some of the subscriptions. A need came up again today after upgrading gPodder to the latest release, and having issues with it.

Android Toys

Okay, less toys, and more toy. Not only does our other entity work with Apple devices, we have an Android application. Currently development has been done using personal equipment, which has stunted the development to what few android phones we had available. So we decided to pick up a few more Android devices for them to work with.