I dig the blog posted two days ago by the Microsoft SQL Server Development Customer Advisory Team on writing hash functions using HASHBYTES; this is actually pretty cool and fairly promising for replacing the old CHECKSUM and BINARY_CHECKSUM that many of us have had problems with in the past. If you haven’t used these oldies but goodies, they are not precise, and can cause problems if you’re using them to test for changing rows. I remember back with SQL Server 7 when we tried to use them, but Microsoft didn’t put the slightly-important caveat that they didn’t work 100% of the time. Now, I don’t know about you, but I’m needing pretty much 100% and spot-on all of the time at my workplace. Word? :)
In their blog, they are using the hash to potentially use it as a mechanism to represent potential random locations for data – shards maybe? – using a fairly nice algorithm (MD5) and spread it evenly across buckets. You’ll notice that the experiment shows that the CHECKSUM stuff is imprecise in that it repeats the buckets while theirs, using Hashbyte, does a nice job of moving the data to all potential buckets based on their integer min and max. Pretty good dude!
Go try it out – I’m going to do a few things now that I think about their little experiment, and see what I can come up with.
Thanks for reading,
Lee
---------------------
http://blogs.msdn.com/b/sqlcat/
http://blogs.msdn.com/b/sqlcat/archive/2011/11/28/writing-new-hash-functions-for-sql-server.aspx
1a86d8bc-47f1-4e27-8eaf-6723adc6be15|0|.0