TheGeekery

The Usual Tech Ramblings

Power of the Date Format

Earlier today, one of our data guys asked me a SQL question regarding a dealers DMS. He was getting a weird issue reporting invalid labels. When I took a look, this was the offending bit of code:

1
DATE( ( DDATEY * 10000 ) + (DDATEM * 100) + DDATED, YYYYMMDD) > (current date - '180' days)

This looks odd, but it appears that they have split the date into 3 fields, and to do a query on a certain date range, you have to concatenate the date parts together, then do the conversion. The first thing I noticed about this was the '180'. Usually numeric values don’t have enclosures, so I removed those, and ran the code again. This time with a different result, but still an error. The error this time was invalid date.

When I started to think about the problem, I realized that the possible issue was in the concatenation of the date parts making an invalid date. So I started to break down the equation to figure out which part was causing the problem. I changed the query to:

1
DATE( ( DDATEY * 10000 ) + (DDATEM * 100) + '01', YYYYMMDD) > (current date - '180' days)

This would force every date combination to be the first of the month. This would seem logical to me, as I’d have assumed the day was more likely to be messed up rather than the month (30th february for example). This also generated an invalid date message. So this hinted the error was in another part of the date. To save running through all the date parts, I figured it’d be easier to split the date back up.

1
DDATEY >= YEAR(current_date) AND DDATEM >= MONTH(current_date - 2 months)

This probably resulted in giving them more information than they needed, but they can ignore what they don’t want. The only issue comes to the end of the year. When January rolls around, the above query will only return data for January, and not data back into the last month.

It only draws me to conclude that the front end validation has somewhere failed somewhere, and allowing bad data into the back end system. This makes the original query fail every time, as it’s a concatenated query, the entire table data is used, unless other limits are imposed first to drop the bad data.

This all makes me wonder, why on Earth would anybody use this split format? It’s difficult to produce reports on, without creating complicated queries. Take for example, trying to run a query in February of one year, for data back to April of the previous one. This ends up producing queries like:

1
DATE( ( DDATEY * 10000 ) + (DDATEM * 100) + DDATED, YYYYMMDD) > 20060401

Well, we know the above doesn’t work because of invalid dates… So we end up having to do this:

1
(DDATEY = 2006 and DDATEM >= 4) or (DDATEY = 2007)

This only gets more complicated the more expansive you want your date range to be. So why not just use a date format?

1
DATEFIELD > '2006-04-01'

Simple, and yet…. simple. I don’t see any benefits in trying to use the other methodology.

Comments