I read an article by David Poole a few weeks ago where he talks about addressing non-stated requirements. I believe this is an area that is quite often overlooked when data modeling; at least up front.  He sums it up very well

“As someone who takes a great deal of pride in what I produce I was initially shocked but on reflection greatly impressed by what they discovered. The bit that struck home for me was that a large part of what they discovered were sins of omission. I had fulfilled the requirement as supplied to me but not put sufficient consideration into what my colleagues today call the NSRs or Non-Stated Requirements.”

In order to build higher quality applications it is important to understand the domain and model it appropriately. This typically requires asking domain experts a lot of questions that they will assume you just know.

In his article, he uses the examines a UK postal code. An example I usually raise in discussions is should Heart Rate (HR) or Temperature really by nvarchar(4000)? Shouldn’t temperature be a range of values similar to 80.0 thru 110.0 Fahrenheit, or perhaps as Celsius 27 thru 43? In these situations I can model it appropriately as decimal(4,1) or as an integer, as the case may be, with the appropriate constraints. I understand legacy decisions, but when I think of trying to build a data warehouse there are a lot of data elements that need to be more deeply understood.

We should strive to ask these questions earlier. It is easier to build simpler applications on top of clean data. Obviously the challenge is always how to handle dirty data, but I believe it is better to handle it sooner (ETL) rather than later (application layer). As we continue to build higher quality applications, having a dictionary / crib-sheet like the one described in David’s post can be very helpful.

david-pool-sample

 

Have you come across any interesting data that threw you for a loop because it was so far outside what was allowable? Would a crib sheet like this have helped?

Let us know in the comments below.