Fresh off of reading the great blog by the kind folks over at SQLCat, I thought first in using their discovery and experimentation for a slightly more practical use…trying to implement the consistency and richness of HASHBYTES over BINARY_CHECKSUM() for comparing rows in a table. (Their example is fine, but I’m not doing a lot of sharding and cloud stuff these days). Because BINARY_CHECKSUM() isn’t all that in SQL Server, this might work perfectly for this type of operation, so let’s check it out. Run the following script:
USE tempdb
IF OBJECT_ID('HashbyteTest') IS NOT NULL
DROP TABLE dbo.HashbyteTest
GO
IF OBJECT_ID('HashbyteTest2') IS NOT NULL
DROP TABLE dbo.HashbyteTest2
GO
SELECT fileid, groupid, size, maxsize
INTO dbo.HashbyteTest
FROM sys.sysfiles
GO
Now I’m going to add a computed column using HASHBYTE, and create another table to do our data compare. Remember that SELECT INTO doesn’t take computed columns and push them forward, so I’m going to have to script it out to get HashbyteTest2 table like I want it:
ALTER TABLE HashbyteTest
ADD ColumnHash AS
HashBytes('MD5', CAST(fileid AS VARCHAR(4000)))
+HashBytes('MD5', CAST(groupid AS VARCHAR(4000)))
+HashBytes('MD5', CAST(size AS VARCHAR(4000)))
+HashBytes('MD5', CAST(maxsize AS VARCHAR(4000)))
GO
CREATE TABLE [dbo].[HashbyteTest2](
[fileid] [smallint] NULL,
[groupid] [smallint] NULL,
[size] [int] NOT NULL,
[maxsize] [int] NOT NULL,
[ColumnHash] AS
(((hashbytes('MD5',CONVERT([varchar](4000),[fileid],0))
+hashbytes('MD5',CONVERT([varchar](4000),[groupid],0)))
+hashbytes('MD5',CONVERT([varchar](4000),[size],0)))
+hashbytes('MD5',CONVERT([varchar](4000),[maxsize],0)))
) ON [PRIMARY]
GO
INSERT dbo.HashbyteTest2 (fileid, groupid, size, maxsize)
SELECT fileid ,groupid ,size ,maxsize
FROM dbo.HashbyteTest
I’m just adding the hashbyte result together. Also note that the data type for the column converts to VARBINARY, so if you decide to do this some other way using variables, you can use VARBINARY to stuff the values in.
Do our cool compare from a previous blog post and see how this one works using the ColumnHash:
SELECT MIN(fileid) AS fileid, ColumnHash
FROM
(SELECT fileid, ColumnHash
FROM dbo.HashbyteTest
UNION ALL
SELECT fileid, ColumnHash
FROM dbo.HashbyteTest2) X
GROUP BY ColumnHash
HAVING COUNT(1)=1
These are the same. So let’s update the data and then run the same script again:
UPDATE dbo.HashbyteTest2
SET groupid=-500
WHERE fileid=2
GO
The actual value changed, which you can see if you run only the Hashbyte function for the groupid column. Check the below comparison where I”ve highlighted the groupid hash:
F89CC14862CB876184FCDFD776B5C3DCB5E21872DAB26657D5FF90
F89CC14862CCFCD208495D565EF66E7DFF9F98764DA26657D5FF90
Update it back and then run the function – looks pretty good again to me.
UPDATE dbo.HashbyteTest2
SET groupid=0
WHERE fileid=2
Thanks for reading,
Lee
----------------------
http://blogs.msdn.com/b/sqlcat/archive/2011/11/28/writing-new-hash-functions-for-sql-server.aspx
a8fffc51-d38a-4de5-81eb-b1bdcc45aa98|0|.0