Fresh off of reading the great blog by the kind folks over at SQLCat, I thought first in using their discovery and experimentation for a slightly more practical use…trying to implement the consistency and richness of HASHBYTES over BINARY_CHECKSUM() for comparing rows in a table. (Their example is fine, but I’m not doing a lot of sharding and cloud stuff these days). Because BINARY_CHECKSUM() isn’t all that in SQL Server, this might work perfectly for this type of operation, so let’s check it out. Run the following script:

USE tempdb        
IF OBJECT_ID('HashbyteTest') IS NOT NULL
    DROP TABLE dbo.HashbyteTest
GO
IF OBJECT_ID('HashbyteTest2') IS NOT NULL
    DROP TABLE dbo.HashbyteTest2
GO
SELECT fileid, groupid, size, maxsize
INTO dbo.HashbyteTest
FROM sys.sysfiles
GO

Now I’m going to add a computed column using HASHBYTE, and create another table to do our data compare. Remember that SELECT INTO doesn’t take computed columns and push them forward, so I’m going to have to script it out to get HashbyteTest2 table like I want it:

ALTER TABLE HashbyteTest
    ADD ColumnHash AS 
                 HashBytes('MD5', CAST(fileid AS VARCHAR(4000)))
                +HashBytes('MD5', CAST(groupid AS VARCHAR(4000)))
                +HashBytes('MD5', CAST(size AS VARCHAR(4000)))
                +HashBytes('MD5', CAST(maxsize AS VARCHAR(4000)))
GO         

CREATE TABLE [dbo].[HashbyteTest2](
    [fileid] [smallint] NULL,
    [groupid] [smallint] NULL,
    [size] [int] NOT NULL,
    [maxsize] [int] NOT NULL,
    [ColumnHash] AS 
              (((hashbytes('MD5',CONVERT([varchar](4000),[fileid],0))
                +hashbytes('MD5',CONVERT([varchar](4000),[groupid],0)))
                +hashbytes('MD5',CONVERT([varchar](4000),[size],0)))
                +hashbytes('MD5',CONVERT([varchar](4000),[maxsize],0)))
) ON [PRIMARY]

GO

INSERT dbo.HashbyteTest2 (fileid, groupid, size, maxsize)
SELECT  fileid ,groupid ,size ,maxsize 
FROM dbo.HashbyteTest

I’m just adding the hashbyte result together. Also note that the data type for the column converts to VARBINARY, so if you decide to do this some other way using variables, you can use VARBINARY to stuff the values in.

Do our cool compare from a previous blog post and see how this one works using the ColumnHash:

SELECT MIN(fileid) AS fileid, ColumnHash
FROM   
        (SELECT fileid, ColumnHash
        FROM dbo.HashbyteTest 
        UNION ALL
        SELECT fileid,   ColumnHash
        FROM dbo.HashbyteTest2) X
GROUP BY ColumnHash
HAVING COUNT(1)=1

These are the same. So let’s update the data and then run the same script again:

UPDATE dbo.HashbyteTest2
SET groupid=-500
WHERE fileid=2 
GO
 

The actual value changed, which you can see if you run only the Hashbyte function for the groupid column. Check the below comparison where I”ve highlighted the groupid hash:

F89CC14862CB876184FCDFD776B5C3DCB5E21872DAB26657D5FF90
F89CC14862CCFCD208495D565EF66E7DFF9F98764DA26657D5FF90

Update it back and then run the function – looks pretty good again to me.

UPDATE dbo.HashbyteTest2
SET groupid=0
WHERE fileid=2 

Thanks for reading,
Lee

 

----------------------

 

 

http://blogs.msdn.com/b/sqlcat/archive/2011/11/28/writing-new-hash-functions-for-sql-server.aspx


Posted in: TSQL , SQL Server 2008/R2  Tags:
blog comments powered by Disqus

by Lee Everest, M.S.

Info

Poll

Do you use Azure or cloud in your organization?



Show Results

Ads

Search


Month List

Calendar

«  May 2012  »
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2012 Lee Everest's SQL Server, etc. weblog