TheGeekery

The Usual Tech Ramblings

Bad Day...

Yesterday started off okay, I woke up. From there, it was all downhill.

After getting into work, I noticed one of the DB servers was ‘misbehaving’. When I say misbehaving, I mean, the instance was running, but the enterprise manager couldn’t see anything on the server (no DBs, no logs, nothing). This is generally considered a “bad thing ™” as it usually means your master database has gone south, and left you for the day. Anybody that has worked with MS SQL knows that the master db is the heart and soul of MS SQL. If it’s broke, SQL is broke. So you have to restore the database from backup. This has to be done in a very special way, as nobody can be accessing the SQL server while trying to restore the master database. So you have to first shut down SQL. This was the easiest bit of this whole process. You then have to try restarting SQL, with extra command options. There are two ways to do this. Either edit the service properties, or running it from the command line. I used the command line option as it saves having to tweak stuff that is already setup.

1
C:> sqlservr -m

Oups. This generated an error for me because this server has multiple instances of SQL. So a little call to the good ol’ help file…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
C:>sqlservr.exe /?
usage: sqlservr
        [-c] (not as a service)
        [-d file] (alternative master data file)
        [-l file] (alternative master log file)
        [-e file] (alternate errorlog file)
        [-f] (minimal configuration mode)
        [-m] (single user admin mode)
        [-g number] (stack MB to reserve)
        [-n] (do not use event logging)
        [-O] (allow startup without DCOM support)
        [-s name] (alternate registry key name)
        [-T <number>] (trace flag turned on at startup)
        [-x] (no statistics tracking)
        [-y number] (stack dump on this error)
        [-B] (breakpoint on error (used with -y))
        [-I number] (set I/O affinity mask)

See documentation for details.

So I have to specify -s and the instance name.

1
C:> sqlservr -m -s TEST

Hooray, along bumbles a bunch of start-up information, shortly followed by a stack trace, and a death that can only be described as “spectacular”. This is where the “bad stuff” started. The errors pointed to a seriously corrupted master db. Something I had already figured. With this, SQL server was dead, so I had to find another route.

Bundled with SQL 2000 is a tool called rebuildm. This tool takes a copy of the original master, model, and msdb from the SQL CD. So, before continuing, make sure you copy all of your database files (at least the control files) and have a safe copy as they will be overwritten. Apparently there is a bit of a bug in the rebuildm tool that copies the files, but doesn’t clear the read-only attributes, and has a few issues when working with the data files from the CD (see this). So I copied the data directory from <cd drive>:\X86\Data, and put it onto the C:\ to make it a little easier. Then I ran the rebuildm command. This takes a few moments, and a dialog pops up asking where the source files are, and what instance to rebuild (might not prompt for instances if you only have the default install).

Again, I attempted to run sqlservr -m -s TEST to get SQL going in ‘single user mode’. Again, oodles of errors. Really not a good sign. So I retried the rebuildm command, and this time it ran some extra stuff it didn’t run the last time, a couple of fix up scripts, similar to the ones executed after the SQL 2000 SP4 installation. Crossing my fingers, I reattempted starting SQL… hooray… it started. So now… we have a running SQL server, but it is empty, none of our DBs are there, and all the jobs are missing. But this is good. We have a running SQL server. We then have to restore SQL’s master database from a backup.

1
2
3
restore database master
from disk='\\fileserver\backups\test-sql\master_db_20060506.bak'
with recovery, replace

This is the point I realized that I need to kick somebody in the dev department. They had setup a web page to use SA to login to the server. In ‘single user mode’, SQL will allow only one SA account to login and “fix” whatever is wrong. The web server won, so I have to stop SQL, and stop IIS on the offending server, and retry. After a few minutes, it then pops back, and says the DB is restored, and it is now shutting down SQL, and all goes quiet.

This is the point I was crossing my fingers. I opened the services manager, and attempted to start SQL up normally again. After a few moments of thinking, SQL sprang into life, and we had a SQL server back up and running. Next step is to then restore the model, and msdb databases from your backups to ensure a restored server.

phew after all that, it ended up being about 10pm, fun day… Especially after finding out that not only am I acting as a stand in DBA until they “might” find somebody else, but I am now part of the promotions team that pushes software to our production web servers… boink

Comments