After a month of him leaving, he still haunts me. It appears that before he “left”, the DBA had been working on modifying some of the scripts that do daily maintenance of the DB. This is fine, however he neglected to tell anybody he was doing it, and that he hadn’t finished.
Today I decided to investigate why we’d been receiving an increased level of timeouts from our application. Not being a DBA, I wasn’t 100% sure where to go first, however I had considered looking at an application we’re running from Veritas (now Symantec) called InDepth Precise. It produces statistics from SQL providing us information about CPU, memory, query run time, record hits, index information, and the likes. However, not being entirely upto date with the latest version which was a “must have” according to the DBA (who barely used it), I decided to go on past experiences with out application. Here was my rundown…
- Indexes
- Long running jobs
- Dealer imports
- Bad queries
- Network issues
Well, I eliminated the dealer imports immediately based on the fact this has been slowly increasing, and the imports are done before 10am. I then decided to look at the indexing stuff. A quick sidestep so the basics are understood. We have a large table containing approximately 10 million records, and a second rather large table containing approximately 9 million records. These constitute the core data on our system. Indexes in any system are helpful, for us they’re critical.
Anyway, back to the indexes. A series of jobs had been setup to handle backups, and reindexing, so I started going through them. The first job checks every table for indexes, and then checks the index for fragmentation. Then, this information is stored in a table. This secondary table is then used to base the reindexing tasks on. A series of steps are then performed.
- Clustered Indexes that are fragmented by 10-20% are reindexed first.
- Clustered Indexes that are fragmented by 20-30% are reindexed second.
- Clustered Indexes that are fragmented by 30% or more are reindexed third.
This is then repeated for standard indexes… with one exception. The two large tables are ignored. This is because the morning imports have a habbit of trashing the indexes a little. So, these two large tables have their indexes done a little later. This is all fairly logical as it would seem stupid to reindex a table, only to have to reindex it again a few hours later.
Anyway, at about 5am, the backup kicks off. The backup basically does the following:
- Reindexes the 2 large tables
- Creates a backup
- Truncates the log
- Copies backup to reporting server
However, today I noticed that the job that had been setup to skip the reindexing stage. This could be a bad thing…
And sure enough, it was. I did a ‘dbcc checkcontig’ which allows you to see the fragmentation on the indexes, and sure enough, the largest index in the database was fragmented by 84%. So tonight, and probably various times over the next week, I will be stumbling through hundreds of lines of SQL to see if I can figure out exactly what he was doing, and exactly how best to resolve this issue…
sighs