Whilst moving out data center from AT&T hosting, to our corporate office data center, we kept stumbling on a bit of a problem. IIS6 had terrible performance compared with IIS5. Basically it’s been keeping myself, and several of the data center techs up all night (2-3am). The behavior of the problem is that the IIS server starts generating SQL errors similar to the following:
SQL Server Does Not Exist Or Access Denied
This was odd, because neither the web server, nor SQL server were exhibiting any kind of performance issues, but we still seemed to be getting connection issues. So I started to look into it.
What I noticed was that the IIS server was opening ports up to 5000, and stopping. When it hit that ceiling, half the ports entered a TIME_WAIT state. During this time, the errors flowed freely. As soon as the TIME_WAIT sockets cleaned up, the errors went away.
I decided to do some research on TIME_WAIT sockets, and found the following item from port80software. For a quick summary, when a tcp socket is opened, data is transfered in 4 parts, a send, an acknowledgment, and finish, and a second acknowledgment. A TIME_WAIT socket occurs after the second ACK, and is basically a “lock” on the socket to keep it open to receive any stray packets that may have been delayed, or miss-routed. This stops any other application from using the port for safety. The last thing an application wants is to receive a packet that it didn’t expect, that has nothing to do with the conversation it’s currently having.
Armed with this information, I took at look on Microsoft’s support site, and found a handful of articles on TIME_WAIT stuff, KB328476 jumped out right away. It detailed TCP/IP settings that adjusted timeouts, and port settings. These looked interesting. It referenced another article on how to apply these settings in KB319502.
Applying the settings is easy, a quick registry change, and a reboot. When the server came back up, we restarted the load test again to see how the performance went. After a few minutes, the changes were obviously affective. Throughput increased dramatically, and the number of errors relating to the SQL server not existing dropped to 0. Error ratio was down from 45% to about 2% of the transactions. Transaction could really didn’t increase a huge amount, but successful counts was certainly better.