Last week I walked into the office greeted with an email telling me “the virtual machines are down”. This is most concerning, not sure how it went unnoticed by our monitoring software that multiple VM machines went offline, and nor did we get any notification from the other department that uses these machines that any of the night jobs had failed. So I set about figuring out what was going on…
As with most user based panic attack emails, there is very little information, though this did have a screen shot of the vCenter client software showing it couldn’t connect to our vCenter server. This error is slightly different to the actual stated issue of the virtual machines are down. I double checked out monitoring software, it showed the server was up and running, no disk or CPU issues, memory was inline, albeit a little lower than normal, but everything there checked out. So I tried myself from on the server itself, and was greeted with the same error the end users were presented…
vSphere Client could not connect with the vCenter Server 'servername'. Details: A connection failure occurred (Unable to connect to the remote server)
I checked the services, and saw that the “VMware VirtualCenter Server” was not running. A quick attempt to restart the service resulted in failure, with the following error being logged in the System event log:
The VMware VirtualCenter Server service terminated with service-specific error 2 (0x2).
A quick google search, and I stumbled across this hint on the VMware Communities site. This page suggested that some people had issues with a service dependency on SQL not being created, and VMware server was starting before SQL was available. I did a quick check, and SQL was running, and accepting connections, but while poking around, I noticed a lot of errors from SQL server in the Application event log, which pointed right to the issue.
CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 4096 MB per database.
Then it dawned on me. vCenter was unable to use the database because it had exceeded the limits of the license. SQL Express, which is what is used for vCenter install if you don’t have a SQL (or Oracle) server available has a limit of a 4GB database. We originally went with SQL Express because we needed to get it deployed as soon as possible, and didn’t get around to going back1. This meant that now was the time to go back and fix this issue, as well as adjust some monitoring.
After installing SQL 2008, there are a few steps to follow for migrating the database from the Express edition to the full blown version. Fortunately VMware has them documented here. This document has all the references required, including links over to Microsoft on how to make backups of the database, and restore at another location.
One caveat that caused 90 minutes of pain was User Account Control, or UAC. If you’ve used Windows Vista, 7, or 2008, a feature was introduced to help reduce the risks of users installing stuff they weren’t supposed to (aka viruses, and malware). The problem is, it also hinders some stuff without you knowing. For example, I spent 90 minutes trying to figure out why, as a server administrator, I could not make a backup of the database in SQL Express. Simply launching the SQL 2008 management studio as myself2, any attempt at accessing the database showed I didn’t have permissions. To get around this, I had to right click on the SQL management studio, and tell it to run as an administrator.
Once I’d migrated the database file, setup the jobs, and granted the permissions based on the documentation, all the services came back up and were happy once again. One additional change I made was the recommendation in the original forum post, and setup the SQL dependencies for the VMware services.