TheGeekery

1Password and Broken Certificate Chains

2023-10-20T20:17:00-07:00

A while back I switched to a Linux based desktop for my work machine as I’d been doing a lot of work in ansible, and having to keep messing with VMs, SSH, and various other hoops was just getting annoying. I’d wanted to experiement for a while anyway. That’s another set of posts. This one is about the 1Password client, and certificate chains.

I’d like to say most folks have a general grasp of browser certificates, the thing that gives a website the little lock icon, but that’d be a lie. A lot of highly technical folks don’t have a general grasp of it, and that’s not to knock anybody. It can get complicated quickly. That’s what I found out this week when the 1Password desktop client refrused to go online. It was essentially stuck in offline mode. Doing all the basic troubleshooting, verified internet connectivity, verified I could get to the 1Password site, checked status websites, etc etc. All seemed good, just not the desktop client.

A little further snooping, and I discovered it was logging the failures to ~/.config/1Password/logs/, with the following entry:

IoError(IoError(error sending request for url (): error trying to connect: unexpected error: failed to load system root certificates: Could not load PEM file "/usr/lib/ssl/cert.pem"))))

Okay, this was helpful. It was telling us the client couldn’t open the machine’s trusted root authorities list. Checking the file, I found it was actually a symlink to /etc/ssl/certs/ca-certificates.crt (I’m running Ubuntu). Permissions on that file, and the symlink all looked good, I could open the file using vim, head, cat, and any other tool I could use to validate access. Nothing was immediately obvious, even some of the searches across the internet didn’t help much. There was some mentions that on earlier Ubuntu machines, the file didn’t exist, others said they had to create symlinks themselves. But these were all 2017 and earlier posts, and the fact that various tools opened that path suggested those were not related.

So maybe it wasn’t about reading the file, it was about the contents of the file. On looking at the contents of the file, I skimmed over a bunch of certificates, they all looked good, until I got to the very bottom. There was a bunch of random characters, and what looks to be parts of a certificate, but in binary type format. Bits of it looked familiar, they were the root CA for our Windows Certificate Authority. I realized exactly what was going on. I’d copied the DER formatted certificate from our Root CA, not the PEM formatted certificate. When I ran the update-ca-certificates, the script dutifully copied the contents of all the files into /etc/ssl/certs/ca-certificates.crt and created an invalid file, and 1Password couldn’t read it.

The solution in this case was to convert the DER to PEM formatted files using the following:

openssl x509 -inform DER -outform PEM -text -in MyROOT.crt -out MyROOT.cer

Then I removed the crt files, and re-ran the update-ca-certificates command. As soon as this was executed, I clicked the “connect” button in 1Password and it immediately connected.

Changing certbot validation plugin

2023-09-28T09:00:00-07:00

I use letsencrypt for a number of SSL certificates, from websites to mail services. The easiest, and documented, way of requesting certificates is via certbot. This is a utility that makes requesting certificates easy. I won’t go into the details on how to do that, there’s plenty of guides, and even the documentation gives you some straight forward steps.

Part of the request process involves validation, just like the traditional SSL providers, which prompts for some method of validating you own the domain you’re attempting to request a certificate for. The most obvious way is a text file on a web server, another is DNS. In my case, I use DNS validation for my mail servers as they don’t run web servers. There are a number of plugins for DNS validation that will automatically push the required DNS records for you, so you don’t have to do them manually. For example, Route53 (Amazon Web Services’s DNS service) you’d do something like:

certbot certonly --dns-route53 -d example.com

Works great, until you decide you are moving DNS providers and find the automatic updates in the background stop working. So you need to update the information. There’s 2 ways to do this; The first way is to edit the renewal configuration file in /etc/letsencrypt/renewal/example.com.conf. The other is via certbot itself, which validates the actual renewal is going to work. We’ll go with the latter. In my case, I moved from Route53 to Cloudflare, so the change would look like this:

certbot reconfigure --cert-name example.com --dns-cloudflare --dns-cloudflare-credentials /path/to/credentials.ini

This runs the command as if it was the initial configuration using --dry-run and validates a successful update of DNS records. If it’s successful, you’ll get a notice that the command was successful, and the next renewal will use the new validation plugin. Now you’re all done, and using a new plugin.

Stop The Bleed

2019-05-19T09:00:00-07:00

May has been announced as the first “National Stop the Bleed” month. The “Stop the Bleed” campaign is trying to empower the general public to know how to deal with life-threatening emergencies usually involving rapid blood loss. There are many situations that this could be useful for a bystander, from car accidents to gardening disasters to gun shot injuries.

Last week, Brian Green posted about it, and I responded with a quick shot of my side bag.

May is National #StopTheBleed Month, raising awareness of the importance of being ready to treat traumatic injuries. Here are a few items that could save someone's life. Tourniquet, pressure bandage, clotting sponge, and nitrile gloves ➝ https://t.co/YluPFoeux1 #NSTBM19 pic.twitter.com/UVflTajQSW
— Brian Green (@bfgreen) May 13, 2019

May is National #StopTheBleed Month, raising awareness of the importance of being ready to treat traumatic injuries. Here are a few items that could save someone’s life. Tourniquet, pressure bandage, clotting sponge, and nitrile gloves ➝ http://bit.ly/2xaJ1o7 #NSTBM19

I meant to follow up with some additional details about the gear, but it was a busy week, and I didn’t get a chance. I figured it’d probably be a little easier in post format than tweets anyway. I built my kit myself, but mostly because I have a lot of stuff for other reasons. The kit is attached to my laptop bag, contained inside a VANQUEST FATPack 4x6.

It’s held onto my bag using some Molle Sticks, which have quick release pull strings making it easier to separate from my laptop bag if needed.

On one side of the pack is a pair of medical shears, and on the other is a CAT Tourniquet, both for quick access.

Once removed and opened up, I have some more stuff on the inside for trauma type situations.

Inside the pouch in the various pockets and loops I have quick clotting gauze, chest seals, Nitrile gloves, abdominal pad, couple of sizes of gauze pads, gauze bandage, and a Mylar blanket.

Gear List

The total kit cost is about $190, but I already had a number of these items because of other things I’m involved in. There are a number of websites that sell prebuilt kits similar to this if you don’t want to go through the trouble of building them out yourself. This gives you an idea of what is in the kits, and what I carry every day.

All of these links are non-affiliate, but they do point to Amazon Smile. If you use Amazon, sign up to use Smile and your chosen charity gets money, and it costs you nothing when you buy stuff.

I would strongly suggest finding a class in your area, or if you can find an instructor to come to your place of work, school, community or worship, that would be even better as many people can learn at the same time. The classes are often free.

Raspberry Pi and the dreaded undervoltage notifications

2019-02-08T16:00:00-08:00

I’ve been tinkering around a bit with some Raspberry Pi devices for a number of little projects. Most have been related to home automation type stuffs, but I built one with a 7” screen that I was going to be using for radio related things. Originally I had tossed together a small kit with an SDR for use on a camp because I knew we’d be out of range of cell phone service, but knew I could still take advantage of radio frequencies from satellites to get data, specifically weather images.

All seemed to go quite nicely, however I’d sporadically get a lightening bolt in the top right corner of the screen. I later learned that was a sign that the Pi wasn’t getting enough voltage. This baffled me, I was using a decent 5v power source, why would I get a low voltage issue? So I decided to do some research.

For the uninitated, a Raspberry Pi is a single board personal computer. The best place to get a lot more information on the Pi is the official website. For this post, the bits we’re most interested in is the power requirements. All models of the Pi use a 5V (5 volts) input via a USB-B Micro connector, the ones used on most cell phones. The bit that varies from model to model, and attached peripherals, is the amount of power required to drive them (see here for a breakdown). For example, a Pi 3 Model B+ can draw as little as 500mA (500 milliamps) or up to 1.2A (1.2 Amps or Amperes). Most cell phone chargers can supply that kind of power fairly easily now. If you look at your Samsung charger that comes with an Samsung Galaxy S8+ for example, it has an output capacity of 2A. So, this would make one think you could just grab your cell phone charger and use it to power a Raspberry Pi, right?

Absolutely! So why am I writing this blog post? And why does the Pi report “Under-voltage” if a cell phone charger can deliver the required power? First let us take a look at what the issue looks like. When you have a screen attached to your Pi, you’ll see something that looks like this

If you run dmesg or if you look in /var/log/syslog you may see something that looks like this:

[    2.071618] Under-voltage detected! (0x00050005)

I usually see this appear after I start the Pi up, or if I’m doing a lot of processing on it.

So what causes the suitably powered cell phone charger to not be able to deliver the required juice to the Raspberry Pi? It’s actually not the charger itself, but how you connect to the charger. This is where a little high school physics / electronics comes into play.

If you’re tinkering with the Raspberry Pi, I’m going to assume you’re probably familiar with at least some basic contepts of eletronics, but the Law we’re going to talk about is Ohm’s Law. Ohm’s Law states that the current through a conductor between two points is directly proportional to the voltage across those same two points. The ratio of voltage to current is called resistance. This is where we get into material sciences, wire gauges, and specifications.

To save reading through all the specifications, I’m going to tell you that a USB cable is supposed to use 28 to 20 AWG (gauge) for the power lines¹. If I grab one of my USB cables, the total diameter is 0.120” or 3.05mm. In that space, you have to fit 2 28-20 AWG lines for power, and 2 28 AWG for data/signal. They also have to be covered so they don’t short each other, and then all covered completely to stop the 4 wires just hanging out everywhere. Keep these numbers in mind.

Now we’ve talked about the size of the cables, let’s talk about the materials. Copper is expensive, so if a company can reduce costs slightly, you’re likely to find those power lines are on the smaller side such as 28AWG. All materials have a resistivity, even if it’s some crazy high number. In terms of Copper it is about 1.72x10^-8 Ωm. You can see more examples on Wikipedia here. If you don’t recall what x10^-8 or e^-8 means, it is a fancy way to say “move the decimal place 8 spots to the left and saves you writing a lot of zeros. This is a tiny number, but it does have an impact. This is where Voltage Drop comes in. Voltage drop says that the voltage is reduced as current moves through an electrical circuit. Voltage drop is calculated, in DC circuits, using a similar formula to Ohm’s Law, except we’re calculating over the length of the entire wire ².

\[V_{drop(V)} = I_{wire(A)} * R_{wire(\Omega)}\]

So we need to know the resistance of the whole wire, which is calculated using:

\[R_{total} = 2 * R_{wire} * L_m / A_{wire}\]

So resistance of the total wire is 2 times the resistance times the length divided by the area.

\[A_{wire} = \frac{\pi}{4} \times d^2 \\ d_{wire} = (0.127e^{-3})_m \times 92^\frac{36-n_{gauge}}{39} \\ d_{wire} = 0.000127 \times 92^\frac{36-28}{39} \\ d_{wire} = 0.000321m \\ A_{wire} = \frac{\pi}{4} \times 0.000321^2 \\ A_{wire} = 8.097e^{-8} m^2\]

Now to drop it all together.

\[R_{total} = 2 \times 1.72e^{-8} \times 1.8288 / 8.097e^{-8} \\ R_{total} = 0.77\]

So the total resistance for 6’ (or 1.8288m) of copper is 0.77 Ohms. We have the current draw from the Pi, which is 1.2A. So what’s the voltage:

\[V = I R \\ V = 1.2 \times 0.77 \\ V = 0.932V\]

So for the total length of the wire, there is a drop in voltage of 0.77v, or about 15.4%. So a 5v input on 6’ cable becomes 4.23v at the end, and this is in the danger zone for the Pi to be operating.

So how do they (Pi Foundation) get around this? Well their suggested power supply uses 20AWG wire, and it’s about 1.5m long, and doesn’t deal with the signal wires. So lets do using the above formulas again, it only has a voltage drop of 0.16v, or about 3.19%, keeping the power at the Pi above the threshold to generate errors. There are other similar power supplies by other vendors that also use the same setup, like Cana Kit, or the official one. It’s worth noting some kits ship with regular USB cables, which might work for some of the Raspberry Pi line, but you have to be aware of the limitations of the cables.

The TLDR³ of this is that voltage drop across small gauge wires is killing your voltage at the Pi.

Quick thanks to @press5 for a review of the post before I posted it.

This is important to remember. The smaller the number, the larger the diameter of the wire. So a 28 AWG wire is 0.0126” or 0.321mm, while a 20 AWG is 0.0320” or 0.812mm. ↩
If you don’t want to read all the formulas and math, there are voltage drop calculators such as this that work nicely. ↩
Too Long, Didn’t Read ↩

Set-DnsServerResourceRecord and OldInputObject not found

2015-10-23T10:56:54-07:00

While trying to help a co-worker today, I stumbled across a documentation error on Microsoft’s TechNet in relation to Set-DnsServerResourceRecord. The example uses multiple variable initialization, which unfortunately ends up making pointers.

Here is what we get from get-help.

PS C:\> get-help -examples set-dnsserverresourcerecord

NAME
    Set-DnsServerResourceRecord

SYNOPSIS
    Changes a resource record in a DNS zone.

    Example 1: Change the settings of a resource record

    PS C:\> $NewObj = $OldObj = Get-DnsServerResourceRecord -Name "Host01" -ZoneName "contoso.com" -RRType "A"
    PS C:\> $NewObj.TimeToLive = [System.TimeSpan]::FromHours(2)
    PS C:\> Set-DnsServerResourceRecord -NewInputObject $NewObj -OldInputObject $OldObj -ZoneName "contoso.com"
    -PassThru
    HostName                  RecordType Timestamp            TimeToLive      RecordData

    --------                  ---------- ---------            ----------      ----------

    Host01                       A          0                    02:00:00        2.2.2.2


    In this example, the time to live (TTL) value of the resource record named Host01 in the zone named contoso.com is
    changed to 2 hours.

    The first command assigns a resource record named Host01 in the zone named contoso.com to the variables $NewObj
    and $OldObj.

    The second command sets the TTL time span for $NewObj to 2 hours.

    The third command changes the properties of $OldObj to the settings specified for $NewObj in the previous command.

Okay, so the example seems pretty simple. They use the variable pass through to assign the return of Get-DnsServerResourceRecord to 2 variables at the same time. This should save some time, and avoid executing the same command twice. However, this actually causes an issue in this case, and here’s why.

PS C:\> $newobj = $oldobj = Get-DnsServerResourceRecord -ZoneName 'myzone.com' -name 'jatest' -RRType 'A'
PS C:\> $newobj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        2.2.2.2

PS C:\> $oldobj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        2.2.2.2

PS C:\> $newObj.RecordData.IPv4Address = [ipaddress]'8.8.8.8'
PS C:\> $newObj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        8.8.8.8

PS C:\> $oldObj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        8.8.8.8

What I’ve done here is grab the ‘jatest’ host record, outputted the two values, then updated the $newobj to have a new IP address of 8.8.8.8. However, as you can see here, there is an issue, the $oldObj value also updated the IP address. This has happened because $newObj is a pointer to $oldObj, meaning changes to one will apply to the other. Why does this matter? Well, the Set-DnsServerResourceRecord uses the old record information to find the record to update, and then updates it. This is important to understand because you could potentially have multiple IP records for ‘A’ records, or multiple NS records, or multiple MX records, etc. This means the data you are using to find the update must match the record you want to update, otherwise you could update a lot of records incorrectly. If it doesn’t, this happens instead:

PS C:\> Set-DnsServerResourceRecord -ZoneName 'myzone.com' -OldInputObject $oldObj -NewInputObject $newObj
Set-DnsServerResourceRecord : Resource record in OldInputObject not found in myzone.com zone on DNS01 server.
At line:1 char:1
+ Set-DnsServerResourceRecord -ZoneName 'myzone.com' -OldInputObject $oldObj -Ne ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (DNS01:root/Microsoft/...rResourceRecord) [Set-DnsServerResourceRec
   ord], CimException
    + FullyQualifiedErrorId : WIN32 9714,Set-DnsServerResourceRecord

The solution is either to call Get-DnsServerResourceRecord twice, or use the clone() function that is part of the returned operation. Such as this:

PS C:\> $oldobj = Get-DnsServerResourceRecord -ZoneName 'myzone.com' -name 'jatest' -RRType 'A'
PS C:\> $newObj = $oldObj.Clone()
PS C:\> $newObj.RecordData.IPv4Address = [ipaddress]'8.8.8.8'
PS C:\> $newObj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        8.8.8.8


PS C:\> $oldObj

HostName                  RecordType Timestamp            TimeToLive      RecordData
--------                  ---------- ---------            ----------      ----------
jatest                    A          0                    01:00:00        2.2.2.2

With this done, you can now use the Set-DnsServerResourceRecord with the right old and new values, and it will work successfully.

Edit: This is an old blog post, but it still stands true, if using PowerShell 5. However, in PowerShell 7 some things have changed. There is no Clone() operation available on [ciminstance], so you will get a failure with the error:

Method invocation failed because [Microsoft.Management.Infrastructure.CimInstance] does not contain a method named 'Clone'.

This obviously complicates updating the DNS records. There’s a couple of ways that this can be handled. You can use a generic function to serialize and deserialize the object into a new variable, which looks something like this:

function Clone-Object {
    param($InputObject)
    [System.Management.Automation.PSSerializer]::Deserialize(
        [System.Management.Automation.PSSerializer]::Serialize( $InputObject )
    )
}

This would then be used as such:

PS C:\> $oldObj = Get-DnsServerResourceRecord -ZoneName 'myzone.com' -name 'jatest' -RRType 'A'
PS C:\> $newObj = Clone-Object $oldObj

The alternative, and specific for this operation (though applicable for many objects) is to call ::new and pass in the old object, so the code would look something like this instead:

PS C:\> $oldObj = Get-DnsServerResourceRecord -ZoneName 'myzone.com' -name 'jatest' -RRType 'A'
PS C:\> $newObj = [CIMInstance]::new($oldObj)

I updated this post as I saw it pop up as a question on Reddit and figured I’d update in case somebody stumbles on the post again. Thanks to the posters there, and on Stack Overflow for the updated solutions.

Powershell and Single vs Double Quotes

2015-09-08T10:00:00-07:00

There can be a lot of confusion over when to use single and double quotes in PowerShell. Not to worry, this confusion carries over in a lot of programming and scripting languages, such as Perl. I figured, after seeing this tweet, to give a quick run down of when to use which one.

#PowerShell single vs. double quoted strings. Understanding quoting rules is imperative. pic.twitter.com/ZewAVCrb5N
— Trevor Sullivan 🚀 (@pcgeek86) August 24, 2015

This tweet is an example of what to do, and what not to do, but is missing an explanation as to why. So I figured I’d try and explain it, and why there are differences.

Both style quotes delimit a string, however the behavior of them are different¹. Generally speaking you want to use single quotes for all strings as this makes the PowerShell processor treat it as a string and that is it. If you use a double quote however, PowerShell reads every character in the string, and looks for characters that can be substituted for a variable defined outside the string, or are actually evaluated as powershell operations. Lets take a look at an example

PS C:\> write-host 'Single quote example'
Single quote example
PS C:\> write-host "double quote example"
double quote example

In these 2 lines, the execution is essentially the same, nothing special happens. Now lets see what PowerShell does when we throw in a variable:

PS C:\> $var = 'Some Variable'
PS C:\> write-host 'single quote with $var'
single quote with $var
PS C:\> write-host "double quote with $var"
double quote with Some Variable

I also mentioned that with double quotes, using $ will cause PowerShell to evaluate functions as well. A simple example would be something like this:

PS C:\> Write-Host 'This is a sum $(7*6)'
This is a sum $(7*6)
PS C:\> Write-Host "This is a sum $(7*6)"
This is a sum 42

Now, if we go back and look at Trevor’s tweet, what we can see is the top “don’t” example has two paths to copy files. Notice something special about the paths, they have spaces in. If you ever work with file paths that have spaces in, you have to quote the entire path. Quoting the string is not the same as quoting the path, and what we can see in this example is that double quotes are used in passing the two paths to the Start-Process command. What doesn’t happen is the double quotes from the variable definition on line 2 and 3 are not carried over into the string as well. So what should be 2 arguments actually becomes 4. Lets see that at work:

PS C:\> $source = "C:\Departments\Marketing Group\"
PS C:\> $Destination = "D:\Departments\Marketing Group\"
PS C:\> write-host "$source $destination"
C:\Departments\Marketing Group\ D:\Departments\Marketing Group\

In arguments, every space is a delimiter for the next argument, so “C:\Departments\Marketing” is one, then “Group" is another, and so on. So what’s going on in the second example? Well, both single and double quotes are being used, and string.format replacements are being introduced². This allows us to still put variables into single quoted strings, and have double quotes to escape the path.

PS C:\> $source = "C:\Departments\Marketing Group\"
PS C:\> $Destination = "D:\Departments\Marketing Group\"
PS C:\> $ArgumentList = '"{0}" "{1}"' -f $Source, $Destination
PS C:\> write-host $ArgumentList
"C:\Departments\Marketing Group\" "D:\Departments\Marketing Group\"

So now our argument list has quoted paths as required.

There are other ways to use quotes³ that allows you to put double quotes inside a double quoted string. You basically do the quotes twice. So to put a single double quote in string you do them twice. An example would be:

$var = "This is a ""double quote"""

I have an issue doing this because it very quickly and easily becomes confusing trying to keep a track of the number of quotes you have used, and reading the code becomes that much harder.

I made a footnote comment about performance difference as well. In terms of single strings, you don’t really notice a performance difference, but when it comes to itterating over several strings, it starts to build up. Here is a simple test.

$cmd1 = Measure-Command {
	for ($i = 0; $i -lt 100; $i++) {
		write-host 'Test'
	}
}

$cmd2 = Measure-Command {
	for ($i = 0; $i -lt 100; $i++) {
		write-host "Test"
	}
}

$cmd1
$cmd2

Basically just writing out the word test using single and double quotes. Here’s the output:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 18
Ticks             : 180830
TotalDays         : 2.09293981481481E-07
TotalHours        : 5.02305555555556E-06
TotalMinutes      : 0.000301383333333333
TotalSeconds      : 0.018083
TotalMilliseconds : 18.083

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 37
Ticks             : 370316
TotalDays         : 4.28606481481481E-07
TotalHours        : 1.02865555555556E-05
TotalMinutes      : 0.000617193333333333
TotalSeconds      : 0.0370316
TotalMilliseconds : 37.0316

In this simple example over 100 string itterations, and no variable inclusions, you can see that double quotes is just over double the execution time of single quotes. To give some perspective, the average time for a blink is 300-400 milliseconds, so the difference is neglegable, but I’m also running on a fairly powerful machine, and the performance scales with the system being executed on. Give it a shot.

So my general rule of thumb? I try to use single quotes everywhere, and if I have to put variables in the string I tend to use the -f operator. As a side note, I just skimmed a bunch of my older posts and can see where I’ve gone from using double quotes to single quotes as I’ve been going. At some points using -f operator with double quoted strings as well. We all learn new stuffs.

If you want to learn more about quoting strings, you can see get-help about_Quoting_Rules.

And in larger executions impacts run times, but by very small amounts. ↩
string.format will probably be a whole post all on its own, as it’s an incredibly powerful feature. ↩
This rule works for single quotes as well. ↩

Replace SSL on Office Web Apps Farm and certificate not found

2015-09-03T12:13:37-07:00

The steps to replace the SSL certificate in your Microsoft Office Web Apps farm seem to be fairly simply, but we stumbled on an odd issue where it was complaining on some of the farm’s member servers that the certificate couldn’t be found.

The basic steps are as follows¹:

Generate your CSR on one of the farm members
Work with your CA to get a signed certificate
Complete the certificate import
Export the certificate and private key
Import the certificate and private key onto all the farm members
Run the Set-OfficeWebAppsFarm command to set the new certificate

There are some tricks to some of these steps. For example, if you’re using wildcard certificates, you should apply a friendly name to the certificate², and use that in your Set-OfficeWebAppsFarm command.

So with the basic steps covered, you’d think that the changes were pretty obvious. We make sure that all servers have the key imported, and then run the command on the farm, and we might need to restart the services. This is the bit we hit the snag on. After running the command on the node that is reported as the “master”, we thought that the configuration would be pushed to all the nodes. Afterall, it does pop up a nice warning telling you that the cert must be available on all the servers, otherwise the services won’t work.

PS C:> Set-OfficeWebAppsFarm -CertificateName 'star_mydomain_com-2017'
Changing the certificate that is specified via CertificateName while the farm is in operation will lead to failed requests if the certificate is not found on every machine in the farm.
Continue with this operation?
[Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): y
WARNING: The following settings have been changed: star_crossmarkconnect_com_2017. For this to take effect, every machine in the farm must be restarted.

Seems pretty self explanatory right? Seems like the configurations being changed are being pushed out to all the servers, and that you need to restart the services once you’re done to make the changes kick in.

This is where the problems started. When we attempted to restart the services on the other nodes, the service manager failed to start the service, and tossed out some generic error about the service not starting because the files might be in use:

The Office Web Apps service on Local Computer started and then stopped. Some services stop automatically if they are not in use by other services or programs.

This error won’t help you much for searching, but the real error message in the application log gives us a hint as to what the problem is.

Service cannot be started. System.InvalidOperationException: The certificate has not been specified.

But, I know the certificate is on the server, I verified multiple times. I attempted to revert the certificate back to the old certificate (which had now expired) on the farm master, and was presented a similar message:

PS C:> PS C:\Windows\system32> Set-OfficeWebAppsFarm -CertificateName 'star_domain_2015'
Set-OfficeWebAppsFarm : Office Web Apps was unable to find the specified certificate.
At line:1 char:1
+ Set-OfficeWebAppsFarm -CertificateName 'star_domain_2015'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (:) [Set-OfficeWebAppsFarm], ArgumentException
    + FullyQualifiedErrorId : CertificateNotFound,Microsoft.Office.Web.Apps.Administration.SetFarmCommand

So, now we can tell that the error messages are pretty useless, because this was the value it was before, but it’s complaining the certificate cannot be found. What the error really should be is that the certificate was expired, or no longer valid. So I changed the certificate on the farm master back to the new certificate and got that server working while troubleshooting the rest.

I then had one of those epiphany moments, lets verify the configuration on the other nodes, see if there is some discrepency, which I tried to do using the Get-OfficeWebAppsFarm command.

PS C:> Get-OfficeWebAppsFarm
Get-OfficeWebAppsFarm : It does not appear that this machine is part of an Office Web Apps Server farm.
At line:1 char:1
+ Get-OfficeWebAppsFarm
+ ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Get-OfficeWebAppsFarm], InvalidOperationException
    + FullyQualifiedErrorId : NotJoinedToFarm.AgentManagerNotRunning,Microsoft.Office.Web.Apps.Administration.GetFarmCommand

Oh, that’s odd. So, with the service not running, the PowerShell commands report that it’s not a member of a farm, and we cannot start the service because it can’t find the certificate. The Get command on the farm master gave us a log path, so I decided to check that out, and see what information I could get.

One line jumped out:

08/19/2015 08:34:03.92      FarmStateReplicator.exe (0x0B64)             0x0E00     Office Web Apps                    Farm State                         agf1k     Medium       ReadStructuredDataFromXml: [C:\ProgramData\Microsoft\OfficeWebApps\Data\FarmState\settings.xml]    

settings.xml seems like promising file name, so I went over to check it out. Opening in NotePad, I did a quick skim of the contents and found the following line:

 Name="CertificateName" DataType="System.String">star_domain_2015

It’d appear that executing the command on the farm master hadn’t replicated out to the other farm members, and by restarting it was not picking up the certificate. A quick change of this value to the new value, and the services restarted correctly.

This goes to show that making the assumption that executing commands on a “farm” doesn’t always apply to all the nodes in the farm. Or this is a fun little bug. While trying to track down the cause of this, I found some fun interesting stuff about Office Web App farms, such as patching is a pain, and there are other quirks with it too, but those are all for future discoveries.

Assuming you’re not using SSL offloading with a load balancer ↩
We usually use friendly names on our certificates anyway, because nothing is more annoying than trying to figure out which www.domain.com certificate to apply in the IIS bindings dialog, so we usually tack the year on the end, such as www_domain_com-2017. ↩

Powershell and Progress Feedback

2015-06-25T09:38:22-07:00

We’re in the process of enabling a new password reset portal, which requires additional licensing features in Office 365. There is no “apply all” button in Office 365, so we have to do this via script to tens of thousands of user accounts. The problem with this is some form of feedback to the user running the script. PowerShell has some really handy built in features, such as a progress bar. It’s amazing how something so simple can make you feel a little better, and actively see how things are moving along. Progress bars in PowerShell are incredibly simple.

$count = 100

for($i = 1; $i -lt $count; $i++) {
       $pctComp = ($i /$count) * 100
       Write-Progress -Activity 'License assignment...' -Status $('{0}% complete' -f $pctComp) -PercentComplete $pctComp
       sleep 5
}

So this is very basic, all I’m doing is setting a counter to 100, and using a variable $i and incrementing the number. Then figuring out the percentage and using it to set the value in the progress bar. The sleep 5 is so we can actually see the progress bar in action. In our real world example, the count is based on the number of user objects, and no sleep was needed because it was actually doing work, unlike our sample code above.

You can even get fancy with your progress bars, and actually have multiple progress bars. A use case example for this would be if you have a lot of operations on a single object, you might want to report the progress of that object. I’m just going to use a loop inside a loop for my example.

$count = 100

for($i = 1; $i -lt $count; $i++) {
       $pctComp = ($i /$count) * 100
       Write-Progress -Activity 'License assignment...' -Status $('{0}% complete' -f $pctComp) -PercentComplete $pctComp -Id 1
      
       $innerCount = 50
       for ($m = 1; $m -lt $innerCount; $m++) {
             $innerPcgComp = ($m /$innerCount) * 100
             Write-Progress -Activity 'Inner loop' -Status $('{0}% complete' -f $innerPcgComp) -PercentComplete $innerPcgComp -ParentId 1
             sleep 1
      }
      
       sleep 2
}

In this example, notice how I use a -id and on the inner loop use -parentid. With these, the second progress bar becomes indented as a child of the first progress bar. You can keep going through multiple layers if you feel you want to, or you can have multiple parent loops and multiple child loops.

Your status messages don’t have to show the percentage either, they can be messages relating to the location in code. Here is another example.

$count = 100

for($i = 1; $i -lt $count; $i++) {
       $pctComp = ($i /$count) * 100
       Write-Progress -Activity 'License assignment...' -Status $('{0}% complete' -f $pctComp) -PercentComplete $pctComp -Id 1
      
       Write-Progress -Activity 'Inner Loop' -Status 'Starting Inner Loop' -PercentComplete 1 -ParentId 1 -Id 2
      
       sleep 2
      
       Write-Progress -Activity 'Inner Loop' -Status 'Doing some other action' -PercentComplete 5 -ParentId 1 -Id 2
      
       sleep 2
      
       Write-Progress -Activity 'Inner Loop' -Status 'Doing something else' -PercentComplete 15 -ParentId 1 -Id 2
      
       sleep 2
      
       Write-Progress -Activity 'Inner Loop' -Status 'jumping waaaay up there' -PercentComplete 95 -ParentId 1 -Id 2
      
       sleep 2
      
       Write-Progress -Activity 'Inner Loop' -Status 'Finishing' -PercentComplete 100 -ParentId 1 -Id 2
      
       sleep 2
}

Sometimes it’s the simple things that we can add to a script that gives a whole lot of feedback to the user running it. If I’d not done a simple progress bar, we’d be clueless as to the progress of the script running against 45,000 users, and as the script took hours to run, some form of feedback is critical.

What other methods of feedback do you guys use? Have you used progress bars in a different fashion? Let me know.

Custom Windows installs, injecting drivers and features

2015-04-24T08:35:29-07:00

One of the things about new platforms is you get to learn new technology. One of the bad things about new technology is that a lot of your old methods might not apply anymore, need to be revamped, or redesigned completely. Over the last few months I’ve been working on a Cisco UCS platform deployment for work. This has been quite exciting as it’s all new gear, and it’s implementing stuff that we should have been able to implement with our HP BladeSystem C7000 gear.

One of the biggest gotchas so far is that the build image we have for our machines no longer works. The Cisco UCS B200 servers have hardware that isn’t detected out of the box with Windows 2012R2. This means we have to inject drivers into the boot image, and the install image, to make the install work when using Boot From SAN (BFS).

This post is a reflective for myself, and others that might find it handy, because I’m constantly forgetting how to update the drivers in boot images. One thing I’m very thankful for, the new ImageX format that Microsoft started using with Windows Vista. This makes image management so very much easier.

Preparing your build environment

First step is to install Microsoft ADK (Assessment and Deployment Kit). When you run the install, you only need to install the 2 deployment and build packages.

The next step is to prepare the build environment. I have a secondary drive in my desktop, so I built the following structure:

F:\
|-Build
  |-ISO
  |-Windows
  |-Drivers
    |-Network
    |-Storage
    |-Chipset

You’ll need a valid copy of Windows 2012R2¹. If you have it on DVD, simply copy the contents of the DVD into your Windows folder as I have documented above. You’ll also need a copy of the driver CD from the vendor. As this is for the Cisco UCS B200, you need a login, and you can find them tucked here.

Find the drivers you need on the DVD, in my case the Network and Storage drivers were easy as there were only one in the named folders. The chipset was a little difficult because a clean install of Windows didn’t detect the chipset, but pointing the Windows at the driver DVD found all the drivers, and then the device said no special drivers were needed so refused to list any. After some fudging around, I managed to identify these as Intel’s Ivytown drivers, so dropped those in the Chipset folder.

Identifying Install Image and Injecting Drivers

This is where all the magic happens. We’re going to inject the drivers into the image, or slipstream them as it’s called in some places. This is done with just a handful of commands. The first thing we need to do is identify which install image we want to work with. A standard Volume License Windows 2012R2 DVD has 4 install images, standard core, standard with GUI, datacenter core, and datacenter with GUI. As we usually build GUI based boxes, we’re only interested in editing those images for now. Launching a PowerShell prompt with elevated access, we need to list the contents of the install iamge:

F:\Build>dism /Get-ImageInfo /ImageFile:.\Windows\Sources\install.wim

Deployment Image Servicing and Management tool
Version: 6.3.9600.16384

Details for image : .\Windows\Sources\install.wim

Index : 1
Name : Windows Server 2012 R2 SERVERSTANDARDCORE
Description : Windows Server 2012 R2 SERVERSTANDARDCORE
Size : 6,674,506,847 bytes

Index : 2
Name : Windows Server 2012 R2 SERVERSTANDARD
Description : Windows Server 2012 R2 SERVERSTANDARD
Size : 11,831,211,505 bytes

Index : 3
Name : Windows Server 2012 R2 SERVERDATACENTERCORE
Description : Windows Server 2012 R2 SERVERDATACENTERCORE
Size : 6,673,026,597 bytes

Index : 4
Name : Windows Server 2012 R2 SERVERDATACENTER
Description : Windows Server 2012 R2 SERVERDATACENTER
Size : 11,820,847,585 bytes

The operation completed successfully.

As you can see, we have 4 images here, 2 are core (1 and 3) so we’ll ignore those and just work on editing the images we need.

F:\Build>dism /Mount-Image /ImageFile:.\Windows\Sources\install.wim /MountDir:.\ISO /Index:2

Deployment Image Service and Management Tool
Version: 6.3.9600.16384

Mounting Image
[==================52.0%               ]

This bit can take a few minutes. Once mounted, if you open Windows explorer to F:\Build\ISO you’ll see a full drive map of an installed Windows machine. This is where we’re going to inject drivers.

F:\Build>dism /Image:.\ISO /Add-Driver:.\Drivers /recurse

After a few minutes, you’ll get a nice report of the drivers being added. If you have any unsigned drivers, you can add /ForceUnsigned to the end to make it skip signature validation².

Now we save the updated image, and close it out.

F:\Build>dism /Unmount-Image /MountDir:.\ISO /commit

The /commit forces it to save the changes. If you don’t want to save, use /discard.

We repeated the same steps above, but changed /Index:2 to /Index:4 to mount the datacenter edition of Windows.

Adding Features to the Install Image

One of the other things that we needed to do so we could save a step later was enable features in the new build. Again, dism can handle this by toggling the flag and enabling the features. We wanted MPIO enabled because the B200 has 2 paths due to the chassis they are connected in.

F:\Build>dism /Mount-Image /ImageFile:.\Windows\Sources\install.wim /MountDir:.\ISO /Index:2
F:\Build>dism /Image:.\ISO /Enable-Feature /FeatureName:MultipathIo
F:\Build>dism /Unmount-Image /MountDir:.\ISO /commit

Technically you can save some time by enabling the features at the same time you are injecting the drivers. I’ve just got them separated here.

Adding drivers to Setup and WinPE Images

Adding the drivers to the install image isn’t the end of it. If you’re working with hardware that’s not supported out the box (like the Cisco VICs), then you need to add the drivers to the Setup and WinPE images to make sure they can both see the drives that may be presented from another source (SAN for example). The steps are identical to above, except the image we’re targetting:

F:\Build>dism /Get-ImageInfo /ImageFile:.\Windows\Sources\boot.wim

Deployment Image Servicing and Management tool
Version: 6.3.9600.16384

Details for image : .\Windows\Sources\boot.wim

Index : 1
Name : Microsoft Windows PE (x64)
Description : Microsoft Windows PE (x64)
Size : 1,321,549,982 bytes

Index : 2
Name : Microsoft Windows Setup (x64)
Description : Microsoft Windows Setup (x64)
Size : 1,417,514,940 bytes

The operation completed successfully.

F:\Build>dism /Mount-Image /ImageFile:.\Windows\Sources\boot.wim /MountDir:.\ISO /Index:1
F:\Build>dism /Image:.\ISO /Add-Driver:.\Drivers /recurse
F:\Build>dism /Unmount-Image /MountDir:.\ISO /commit
F:\Build>dism /Mount-Image /ImageFile:.\Windows\Sources\boot.wim /MountDir:.\ISO /Index:2
F:\Build>dism /Image:.\ISO /Add-Driver:.\Drivers /recurse
F:\Build>dism /Unmount-Image /MountDir:.\ISO /commit

Now our setup image can see the devices that may be required for disk access.

Building the ISO Image

The final step is to turn all this hardwork into a useable ISO/DVD. This is where the ADK comes into play. You’ll need to launch an elevated command prompt using the “Deployment and Imaging Tools Environment” prompt. This sets the %PATH% variables to include some additional tools. We then navigate back to our build directory and start the build process.

C:\>F:
F:\>cd Build
F:\Build>oscdimg -u2 -bf:\build\windows\boot\etfsboot.com f:\build\windows f:\build\win2012r2_b200_20150312.iso

This take a few minutes as it’s making a new ISO. The -u2 argument is used to force UDF file system. This is needed otherwise the install.wim and some other items get trashed by sizing limitations.

Once you have an ISO file, you can either use your favourite ISO burning utility to put it on DVD, or use your servers KVM/ILO/DRAC to remotely mount it to do the install.

All in all, the process takes about 30 minutes depending on the speed of your machine, disks, and drivers/features being enabled. Sadly it took me nearly 2 days to actually build the final image because I had issues identying and including the right chipset drivers.

These instructions work for Vista or higher, so if you’re a 2008 shop, it should work there too. ↩
We found there were several drivers on the Cisco B200 DVD that were unsigned, but they were not needed for our install, so we could skip this. ↩

vSphere Storage vMotion times out at 32% when crossing SANs

2015-02-17T00:00:00-08:00

A year or so ago we’d upgraded our vCenter from 4.1 to 5.1, and with this upgrade, and some features built into our SAN, we got access VAAI. For example, removing a VM guest would tell the SAN that the guest had been removed, and if the data store had been thinly provisioned from the SAN, it’d clean up and shrink down the space used (in theory).

Another feature we discovered was something called “fast copy”. In layman’s understanding of this feature, when a storage vMotion request was created, the SAN was notified of the request, and the SAN would process the copying of the bits in the background. This is handy because it stops the data from being sent from SAN to host to SAN again. This causes a good speed up with regards to moving machines around.

There was a caveat to the “fast copy” feature that we stumbled across last year. Well, what we stumbled upon was an issue when using vMotion to move machines between SANs. What we didn’t clue in on was that this was because of VAAI and “fast copy”. When we first observed this issue, we didn’t realize the issue was between SANs, we just thought the issue was random. Our VM hosts had storage allocated from 2 different SANs at the time, and our naming convention was a little off, so identifying quickly that the data store was on a different SAN wasn’t entirely obvious at first.

Ultimately the issue presents itself as a vMotion timeout. When you start the vMotion, it zips along until it hits 32%. It then sits there for a few minutes, sometimes up to 5 or 10, then the guest becomes unresponsive. At this point VMware decides the migration has timed out, and rolls back. Sometimes it can take several minutes for the failed guest to start responding again. If the guest is shut down, it usually hangs around 36% for a few minutes, but eventually processes. The error usually looks like this:

The error generally presented is “Timed out waiting for migration data.” It always happened at 32%. A bit of searching around, and I didn’t really uncover the cause of it. At the time we originally spotted this issue, we decided to take an outage and shut the guests down and vMotion them. This killed 2 stones at once, freed memory on the hosts, and gave the guests a reboot to clear memory and such.

Fast forward to nine months ago, and we had an issue where we discovered one of our SANs had become over saturated, and needed space and load removed from it. At this point, we now had a third SAN added to the mix, so we presented new data stores, and went through the process of trying to vMotion quite a lot of VM guests off of one set of data stores (actually 10) to another set. We hit the same wall as before, time outs at 32%. We put it down to the load and space issues on the SAN and went with the outage. This was our dev environment anyway, so it was less of an issue. We didn’t really look into it any further.

Jump forward to this past Tuesday. A sudden alert that multiple VMs had gone offline left us puzzled until we realized that one of the data stores had been way overprovisioned, and the backup software kicked off and with guest snapshots had filled the drive. With a quick bit of work, we moved some guests around, and bumped into the same 32% issue again. Shutting down some guests and shuffling them around got us through the pinch, but left me wondering.

After some experimentation, I was able to narrow down the cause of the issue on a single action. Storage vMotion between SANs. Inner SAN vMotion was snappy, 100GB in less than 2 minutes. Intra-SAN migrations would hit 32% and time out. That’s it, I had the cause of my problem. It had to be a fiber or switch issue… Right?

Not so much. While doing some digging on performance, our fiber switches, and SAN ports, I wasn’t spotting any obvious issues. Doing some searching again on our favourite web search engine, I stumbled across an HP document tucked away in the 3Par area (document was named mmr_kc-0107991, nice name). Bingo! Okay, the details don’t exactly match, for example the document mentions that it freezes at 10%, but it had all the hallmarks of what we were seeing. IntraSAN vMotion, timeouts, and VAAI.

So the solution was to disable VAAI on the host, do the vMotion, and then re-enable it if you still want to use it. VMware has a nice document on how to do that here in KB1033665. With a little PowerCLI¹ we quickly disable VAAI and tested a vMotion on a live machine, and it worked. As we were working on a single cluster at the time, this is what we ended up with:

Get-VMHost -Location (Get-Cluster 'CVHPVMH003') | %{
	Set-VMHostAdvancedConfiguration -VMHost $_ -Name DataMover.HardwareAcceleratedMove -Value 0
	Set-VMHostAdvancedConfiguration -VMHost $_ -Name DataMover.HardwareAcceleratedInit -Value 0
	Set-VMHostAdvancedConfiguration -VMHost $_ -Name VMFS3.HardwareAcceleratedLocking -Value 0
}

Once done, flip the 0s to 1s, and re-enable as needed.

This is something they actually give you in the KB article as well. ↩