One of the projects I’ve been working on recently is a POC in Azure to allow us to move a collection of desktop users to lower end laptops, while using high end servers to perform a lot of data processing. The idea is that we can spin up and destroy machines as we see fit. The plan was fairly solid, and we build out our domain controllers and a template machine with all the software in it, before configuration. We then used PowerShell to spin up new machines as we needed them.
One of the issues I stumped over when working on this was making sure the servers were put into the right network. This was important as they were being joined to a domain. I had originally started with something like this:
$img='imgid_Windows-Server-2008-127GB.vhd'$svcname='mytestservice01'$svcpass='!testpass321!'$svcuser='testadmin'$vm1=New-AzureVMConfig-ImageName$img-InstanceSize'ExtraSmall'-Name$svcname|`Add-AzureProvisioningConfig-WindowsDomain-AdminUsername$svcuser-Password$svcpass-DomainUserName'dmnadmin'-Domain'TestDomain'-DomainPassword'ImnotTelling!'-JoinDomain'TestDomain.local'-TimeZone'Canada Central Standard Time'New-AzureVM-VMs$vm1-ServiceName$svcname-VNetName'Test_Net'-AffinityGroup'TestGroup-USEast'
This seemed to look right, and worked fine, as long as I wasn’t trying to add it to a VNet or an Affinity Group. When I added those options, I was thrown the following error:
It seemed to me that the New-AzureVM command should have had some method to define which subnet was to be allocated to, but it wasn’t there. What was even more confusing was this VNet only had a single subnet, so you’d think it might select that, but not so much luck.
The answer lies in the Set-AzureSubnet command, which should have been pretty obvious to me. You can add it as part of your provisioning command like this:
$vm1=New-AzureVMConfig-ImageName$img-InstanceSize'ExtraSmall'-Name$svcname|`Add-AzureProvisioningConfig-WindowsDomain-AdminUsername$svcuser-Password$svcpass-DomainUserName'dmnadmin'-Domain'TestDomain'-DomainPassword'ImnotTelling!'-JoinDomain'TestDomain.local'-TimeZone'Canada Central Standard Time'|`Set-AzureSubnet'Subnet-1'
All I’ve done is added the extra command to the end, and now Azure is happy. This will spin up a new VM and drop it in the right VNet, Affinity Group, and Subnet. Based on the VNet’s network configurations, and DNS settings, the new machine is provisioned, and joined to the domain immediately.
This makes me very happy because this is a quick sample of how we’d proceed with automating and deploying an undefined number of VMs in Azure based off of our golden image. With some minor tweaks we can loop through and spin up 50 machines with little work.
So it has been a substantially long time since I’ve posted something, and that’s not because I’m being lazy. Well, okay partially because I’m lazy. Evernote has about 7 notes in it for things I want to post about, mostly issues I’ve resolved, but I’ve just been super busy recently.
One of the things I’ve thoroughly enjoyed about my change in work places has been the learning experiences I’ve been subjected to. Where I used to work was pretty much the same stuff, day in, day out. There was little change, and even the introduction of new companies being acquired really didn’t change that. They were either sucked into the fold and their technologies changed to ours, or they were kept separate and I had little to do with them.
Since changing companies I’ve gone from just using VMware and a small level of administering the infrastructure, to being one of the “go-to” people for it in our environment. Same with the storage infrastructure. Where I used to work there was 2 classes of storage, the big beafy HQ stuff where I had no control over at all, to the local NAS which I managed. This has changed to being one of the “go-to” people for the storage stuffs too.
None of that is to say I didn’t learn stuff where I used to work. Due to all the issues we had with code and servers, I have a very broad range of troubleshooting skills that have come in very handy. It helps that I also got a good look at a lot of the code there too because I’ve got a knowledge of reading and understanding code that I probably wouldn’t have in otherwise.
The cool thing about the place I work now is that the development team drive a lot of the changes, working with a very agile development structure. They push the boundaries of our infrastructure, and we adapt and solve for their problems or ideas. This has lead to some pretty cool stuff, and melding of technologies. For example, I’m currently reading up on IIS ARR1. Last month I was tinkering with Windows Azure.
On my list of new things I’ve been learning and playing with at work:
IBM DataPower Appliances
HP 3Par SAN storage
Brocade fiber switches
HP servers (used to work in an all Dell office)
HP Blade chassis
Exchange 2010 (Been away from Exchange for a long time)
More PowerShell than just my “tinkering” scripts
More indepth IIS work
Lync 2010/2013 (I built out the infrastructure and deployed both)
McAfee Mail gateways
HP Rapid Deployment tools
Lots more stuff I am always forgetting…
One thing that did surprise me was becoming a mentor of sorts too. People come to me for guidance and tips on issues. I don’t give out answers, but I’ll guide them in the right direction. This has interested me because I’ve never considered myself an educator in any way, but I apparently seem to be doing okay at guiding people.
I love my job, constantly learning, even when not working with new stuff. As my boss and I constantly say “never a boring day”.
IIS Application Request Routing. It’s being used as a potential replacement for ISA/TMG, but is much more, including load balancing, content caching (think CDN), reverse proxy, ssl offloading, and so on.↩
One of the handy things about Lync is the fact that it’ll parse the Global Address List (GAL), and make them available via the Lync client (using the abserver). This means that Lync will do all the lookup using its own copy of the GAL, rather than hitting the GAL. Additionally, that processed addressbook is cached on the client side, allowing much speedier lookups.
One of the things we’d noticed is that Lync likes the phone numbers formatted in a particular manor, otherwise you end up with some very strange number/calling issues. This leads to a problem because folks update their own address and phone information resulting in a myriad of number formats in Active Directory. A couple of examples:
555 555 1234
555.555.1234 ext. 555
Lync isn’t very happy with this, and will fail to parse these numbers. That is, unless you create normalization rules. This isn’t the same as “Voice Routing” normalization rules, which are rules that are applied when people make calls.
So how do you know Lync doesn’t like the phone numbers you have in the GAL? Lync logs the failures in the file stores path in a file (creatively) called ‘Invalid_AD_Phone_Numbers.txt’ under the file store location. Open the topology builder, and look at the “Files Stores” section, and go to that path in Windows Explorer. Under that path you’ll find a directory structure that looks like this:
The directory 1-WebServices-1 may have a different number depending on the number of Lync installations you have that are sharing the same file store, or if you’ve performed a transition between 2010 and 2013.
Using one of the above numbers as an example, you may find errors that look like this:
Unmatched number: User: '6493bb75-84e7-4f83-8bca-26f1f551a3d4' AD Attribute: 'telephoneNumber' Number: '555.555.1234 x555'
To fix this error, we need to create a normalization rule, these rules are stored in a text file called Company_Phone_Number_Normalization_Rules.txt which is stored in the 1-WebServices-1\ABFiles directory. This file uses regular expressions to match and reformat the numbers to an E.164 format. In the above example, I want to convert the number to be +15555551234;ext=555, so I’d using the following regular expression:
Note the UseNormalizationRules is set to True, if it isn’t use Set-CsAddressBookConfiguration to change it. Once set, you can leave it to the automated process to pick up the changes at the next cycle (in my case 01:30 the following day) or you can use Update-CsAddressBook to force an update.
This process usually takes a little fiddling to adjust for all the variations in phone numbers, but once setup it makes life a lot better for the users.
For the last few weeks I’ve been performing all the preparation work for Lync 2013 in our organization. We’ve had a very successful Lync 2010 pilot, and instead of expanding the 2010 to production, and later having to do a full environment replace for 2013, we decided to jump straight to 2013. Part of the steps, whether a fresh install or an upgrade, is some Active Directory Forest and Domain preperations. These can either be done using the installation wizard, or via PowerShell.
One of these commands is Grant-CsOUPermission. This command is required if you don’t keep your users/servers/computers in the standard containers in AD (I.e, Users in the Users container). In our environment, we move the users into a People OU, so we needed to run the Grant-CsOUPermission command to update some container permissions for Lync to work properly, and allow us to delegate user management. To save some time, I was executing all the commands from one domain, to one of the other child domains in the forest. This was because I didn’t have access to a 64bit machine in that environment without spending additional time spinning up a client to test with. The Lync PowerShell cmdlets allow for this, and this is what I was doing, and having issues with.
I’d first start a PowerShell prompt as a domain admin in the other domain using the runas command:
Adding the domain allowed execution. The problem here was that it was trying to bind to mychild.domain.tld and then access the OU through the link to otherchild.domain.tld. The problem here was that my account in otherchild.domain.tld didn’t have domain admin access in mychild.domain.tld, and hence the error.
So, learning lesson of the day, either execute all the commands on a server in the domain you are worknig on, or remember to specify the domain. As a side note, the Microsoft documentation is a little fuzzy around this area because it says you must sign in to a domain member on the domain you wish to execute the commands, but then specifies that you can execute the commands in a different domain. It gets a little confusing, but once you get your head wrapped around the fact that you can do this across domains, and that you must specify the domain, even if the OU hints at a different domain, things are a little easier to work with.
Earlier today while updating some documentation, I noticed 2 of the servers being monitored in SolarWinds SAM were reporting applications in an “unknown” state. When I pulled up the display, and looked at the details of the state, it was throwing an error:
Bad input parameter. HResult: The specified object is not found on the system.
I thought this was a little weird, as the monitors used to work, and the server hadn’t been patched, or any changes made recently.
First step was verifying that the counters it was looking for really existed. Logging onto the server, I opened the performance monitor, and tracked down the supposidly missing performance counters. They were there. Maybe it was an issue with accessing remotely, so I jumped on the SolarWinds server, and tested remote counter access from there, repeating the same process, but specifying the remote server name. Again, no issue.
This is when I decided to do some searching, and stumbled across a Thwackpost that mentioned the same error, but related to Exchange. They had basically done the same testing as I had, but were urged to open a support ticket for more troubleshooting help with their support department.
The last post, before mine, in that thread was the answer I was looking for. They were experiencing an issue with the Remote Registry service, and a simple restart fixed the issue. I took a look at the services on both the servers I was seeing issues with and the issue jumped out immediately. Both servers were using about 600MB and 50k handles. This is very unusual for a service such as the Remote Registry. This tripped a light bulb, as I was working on an issue with a coworker, and he had identified a bug and hotfix for memory leaks in the Remote Registry. In KB2699780 it details the same behavior, and we were scheduled to deploy this hotfix on a different set of servers for a similar issue.
A quick restart of the Remote Registry service had the applications successfully polling again, now to just schedule some maintenance to get the hotfix applied to these servers.
Like any large organization, we have automated processes that go and happily disable user accounts on termination. This process looks in our HR database for certain flags, and reacts accordingly. As part of the termination/disabling process, it’ll also flag their email account to be hidden from the Exchange Global Address List (GAL).
In Exchange 2003 hiding accounts from the GAL used to handled by an Active Directory (AD) user attribute called msExchHideFromAddressLists. When this was set to TRUE, the user would be hidden from the GAL. Our HR applications toggle this flag for users that are disabled to hide them away from other users.
This process all worked fine for a long time, until Exchange 2007 rolled around. I guess there was plenty of push to allow you to hide a user from all the GALs, but still allow specific GALs to have those users in. So Microsoft introduced a new AD user attribute called showInAddressBook. The problem now appears that if you toggle the msExchHideFromAddressList, but have a value set for showInAddressBook, the user accounts are no longer hidden in the GAL mentioned in the latter attribute.
Can anybody see where this is going? Yup, it appears that all the user accounts were getting the default GALs assigned to the showInAddressBook attribute, so even when they were having the option to hide the user, they were still showing up1. This was causing problems as people that were disabled/terminated were still showing up, and causing some confusions and concerns.
I started to poke around, and bashed together a quick PowerShell script that will walk through all disabled users that have a showInAddressBook attribute, it’ll then wipe out that attribute.
If you’ve not seen LDAP queries before, they work by starting with the operator (and, or, etc), and then the objects that they apply to. So in the example above, it reads as such:
(objectClass=user) AND (userAccountControl:1.2.840.113522.214.171.1243:=2) AND (showInAddressBook=*)
It can get a little more complicated when you start stringing together multiple options such as AND and OR operators, and various combinations of them. In this example, we’re going for pretty simple.
I then used the .NET libraries System.DirectoryServices.DirectorySearcher. This uses the LDAP query specified, and returns all matching results. Next was a case of walking through the results, and fetching a DirectoryEntry object to edit the properties. In this case we’re setting it to $null which removes it.
After letting this script run over about 25k users disabled users, it cleared up the fluff in the GAL, and made HR happy.
As a weird side-note to this, if you check the box to hide the user in the Exchange management suite, it removes the showInAddressBook flag on its own, same for the PowerShell options too.↩
One of the things I had completely forgotten about during my migration from WordPress to Octopress was OpenID. I had used one of the few OpenID plugins that tied into WordPress, and allowed you to use WordPress as an OpenID provider, giving me the ability to login to sites using my WordPress site.
This was great, and I’d completely forgotten about it because I rarely used it. That was until yesterday when somebody on the #Nagios IRC channel had asked a question, and then posted the same question to stackoverflow. I decided to answer the question over there, and remembered I had signed up for an account using OpenID, so I dutifully typed in my site URL, and was stumped because I wasn’t redirected.
This is where I did a little face-meets-desk action. I’d killed my OpenID account by killing off my WordPress site. I tried to think of a way around this, and did some quick searching, and stumbled upon a post by Darrin Mison, on the exact same topic. Darrin had left his WordPress site active over on Wordpress.com, but had migrated to his URL else where. Because of this, Darrin was able to use what is called a deligate, and tell anybody making a request to look elsewhere to authenticate.
This sparked a vague memory, and reminded me that when I first started tinkering with OpenID, I used a different site for the authentication, so a quick check, and I was able to login there. Now I just needed to edit my Octopress site to provide the delegate information.
I used myOpenID.com as my delegate, and they have a help article on how to handle using your own URL. Following what Darrin had done, I edited source/_includes/custom/head.html and added the lines that the were mentioned in the help doc. So now my head.html template looks like this:
<linkrel="openid.server"href="http://www.myopenid.com/server"/><linkrel="openid.delegate"href="http://jonangliss.myopenid.com/"/><linkrel="openid2.local_id"href="http://jonangliss.myopenid.com"/><linkrel="openid2.provider"href="http://www.myopenid.com/server"/><metahttp-equiv="X-XRDS-Location"content="http://www.myopenid.com/xrds?username=jonangliss.myopenid.com"/><!--Fonts from Google"s Web font directory at http://google.com/webfonts --><linkhref="http://fonts.googleapis.com/css?family=PT+Serif:regular,italic,bold,bolditalic"rel="stylesheet"type="text/css"><linkhref="http://fonts.googleapis.com/css?family=PT+Sans:regular,italic,bold,bolditalic"rel="stylesheet"type="text/css">
Pretty simple, and a rebuild of the blog, and my page now includes the delegate headers required to redirect OpenID requests.
In the enterprise licensed version of RDM you are given the ability to add “remote management” interface details to a host configuration. In our environment, that remote management interface is iLO, and is available from a dedicated IP address, over HTTPS, giving you access to a remote console as well as power management features. RDM handles this with a small tweak to the XML file adding another element under the connection meta information.
I’ve removed most of the information, which you can see in the previous post.
As we’re trying to be careful with the file, we need to first validate the XML has a MetaInformation element, and then an existing ServerRemoteManagementUrl element. If one, or neither, exist, then they get created. Not all hosts have iLO interfaces, such as virtual machines, so we need to verify the presence of a DNS record first, and then only create the entry if it exists.
Again, working with a copy of the original file, I use some crafty XPath queries again to only select connections that are RDP. I then loop through the connections/nodes, and extract the name. Lines 14-18 test for the presence of the MetaInformation element, and create it if it doesn’t exist. Line 20 checks for the ServerRemoteManagementUrl element, if it’s not there, it creates it proceeds with DNS validation.
Lines 24-31 perform a DNS lookup, unfortunately it returns an exception rather than a $null or empty object, so I had to throw in some quick dummy catch code that doesn’t really do anything. If a DNS record is returned it creates the new element, and adds it to the MetaInformation element. For the final step, I saved it to a second file so I could do a comparison between the files to make sure it did as I expected.
One thing to note about adding elements to an XML document is that the CreateElement function (lines 16 and 34) are not executed against the node you are adding the element to, they are executed against the document root. This is so that the element gets all the correct name space information. You then append your element to the existing element.
Every now and again I have to strip out elements from an XML file. In this case, I was doing some cleanup of my Remote Desktop Manager configuration file. When I first started my current job, to save a lot of discovery, my boss shared his configuration file. Unfortunately the configuration file had a lot of hosts that had duplicate configuration information that wasn’t relevant because the “duplicate” option had been used to copy existing hosts. This meant stuff like host description had been copied.
Remote Desktop Manager (RDM) uses an XML file for its configuration, which makes editing it really easy. To clean up the invalid descriptions, I used a little PowerShell and some XML know-how. Here is an example entry I need to clean up…
Pretty simple, but here is how it works. The first line is pretty obvious, it’s getting the content of the file1. It then explicitly converts the array object into XML using [xml]. The next bit is where it gets a little harder, and requires a little knowledge of XPath syntax. The code is looking to select a single node, that has the name “Description”, with the data in it that says ‘HP Command View EVA’. If it’s found, it’ll return a XMLElement object, otherwise $node ends up being $null. This gives us the ability to wrap the search in a loop, and remove the elements we don’t need. To remove the element, you have to tell the parent node to remove it, so you ask the node to go back to the parent to remove itself, a little weird, but it works. The final step is to go back and save it to a file.
The hardest bit about handling XML is knowing how XPath stuff works, once that is understood, the rest is usually pretty easy. PowerShell treats XML as an object, so it’s easy to figure out what you can do with the objects using Get-Member.
Which I had copied to C:\Temp to make a backup of, instead of working on the real file.↩
In the course of updating all of our HP BladeSystem blades (BL465c) servers over the last few weeks, I’ve stumbled across some interesting things. For example, you can updated all the iLO cards at once if you have an Onboard Administrator (OA), a TFTP server, and a little XML knowhow…
This gets saved as an XML file on the TFTP server, I named it update_firmware.xml. The USER_LOGIN and PASSWORD fields do not matter as single sign-on is used from the OA. The iLO update binary is put on the TFTP server as well (you should use the version applicable to the hardware you’re updating). Then comes the easy bit. SSH to the Onboard Administrator, and execute the hponcfg command as such:
hponcfg ALL tftp://TFTP_SERVER/update_firmware.xml
If you only need to update a single blade, change ALL to the blade number. Otherwise, this will download the iLO firmware update, push it to each of the iLO cards in the BladeSystem chassis, and then restart them. This will not impact the running server. You should see output like this once it has started: