The steps to replace the SSL certificate in your Microsoft Office Web Apps farm seem to be fairly simply, but we stumbled on an odd issue where it was complaining on some of the farm’s member servers that the certificate couldn’t be found.
The basic steps are as follows1:
- Generate your CSR on one of the farm members
- Work with your CA to get a signed certificate
- Complete the certificate import
- Export the certificate and private key
- Import the certificate and private key onto all the farm members
- Run the
Set-OfficeWebAppsFarm
command to set the new certificate
There are some tricks to some of these steps. For example, if you’re using wildcard certificates, you should apply a friendly name to the certificate2, and use that in your Set-OfficeWebAppsFarm
command.
So with the basic steps covered, you’d think that the changes were pretty obvious. We make sure that all servers have the key imported, and then run the command on the farm, and we might need to restart the services. This is the bit we hit the snag on. After running the command on the node that is reported as the “master”, we thought that the configuration would be pushed to all the nodes. Afterall, it does pop up a nice warning telling you that the cert must be available on all the servers, otherwise the services won’t work.
PS C:> Set-OfficeWebAppsFarm -CertificateName 'star_mydomain_com-2017'
Changing the certificate that is specified via CertificateName while the farm is in operation will lead to failed requests if the certificate is not found on every machine in the farm.
Continue with this operation?
[Y] Yes [N] No [S] Suspend [?] Help (default is "Y"): y
WARNING: The following settings have been changed: star_crossmarkconnect_com_2017. For this to take effect, every machine in the farm must be restarted.
Seems pretty self explanatory right? Seems like the configurations being changed are being pushed out to all the servers, and that you need to restart the services once you’re done to make the changes kick in.
This is where the problems started. When we attempted to restart the services on the other nodes, the service manager failed to start the service, and tossed out some generic error about the service not starting because the files might be in use:
The Office Web Apps service on Local Computer started and then stopped. Some services stop automatically if they are not in use by other services or programs.
This error won’t help you much for searching, but the real error message in the application log gives us a hint as to what the problem is.
Service cannot be started. System.InvalidOperationException: The certificate has not been specified.
But, I know the certificate is on the server, I verified multiple times. I attempted to revert the certificate back to the old certificate (which had now expired) on the farm master, and was presented a similar message:
PS C:> PS C:\Windows\system32> Set-OfficeWebAppsFarm -CertificateName 'star_domain_2015'
Set-OfficeWebAppsFarm : Office Web Apps was unable to find the specified certificate.
At line:1 char:1
+ Set-OfficeWebAppsFarm -CertificateName 'star_domain_2015'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (:) [Set-OfficeWebAppsFarm], ArgumentException
+ FullyQualifiedErrorId : CertificateNotFound,Microsoft.Office.Web.Apps.Administration.SetFarmCommand
So, now we can tell that the error messages are pretty useless, because this was the value it was before, but it’s complaining the certificate cannot be found. What the error really should be is that the certificate was expired, or no longer valid. So I changed the certificate on the farm master back to the new certificate and got that server working while troubleshooting the rest.
I then had one of those epiphany moments, lets verify the configuration on the other nodes, see if there is some discrepency, which I tried to do using the Get-OfficeWebAppsFarm
command.
PS C:> Get-OfficeWebAppsFarm
Get-OfficeWebAppsFarm : It does not appear that this machine is part of an Office Web Apps Server farm.
At line:1 char:1
+ Get-OfficeWebAppsFarm
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Get-OfficeWebAppsFarm], InvalidOperationException
+ FullyQualifiedErrorId : NotJoinedToFarm.AgentManagerNotRunning,Microsoft.Office.Web.Apps.Administration.GetFarmCommand
Oh, that’s odd. So, with the service not running, the PowerShell commands report that it’s not a member of a farm, and we cannot start the service because it can’t find the certificate. The Get
command on the farm master gave us a log path, so I decided to check that out, and see what information I could get.
One line jumped out:
08/19/2015 08:34:03.92 FarmStateReplicator.exe (0x0B64) 0x0E00 Office Web Apps Farm State agf1k Medium ReadStructuredDataFromXml: [C:\ProgramData\Microsoft\OfficeWebApps\Data\FarmState\settings.xml]
settings.xml
seems like promising file name, so I went over to check it out. Opening in NotePad, I did a quick skim of the contents and found the following line:
<Setting Name="CertificateName" DataType="System.String">star_domain_2015</Setting>
It’d appear that executing the command on the farm master hadn’t replicated out to the other farm members, and by restarting it was not picking up the certificate. A quick change of this value to the new value, and the services restarted correctly.
This goes to show that making the assumption that executing commands on a “farm” doesn’t always apply to all the nodes in the farm. Or this is a fun little bug. While trying to track down the cause of this, I found some fun interesting stuff about Office Web App farms, such as patching is a pain, and there are other quirks with it too, but those are all for future discoveries.
-
Assuming you’re not using SSL offloading with a load balancer ↩
-
We usually use friendly names on our certificates anyway, because nothing is more annoying than trying to figure out which www.domain.com certificate to apply in the IIS bindings dialog, so we usually tack the year on the end, such as www_domain_com-2017. ↩