Verifying our results.
I received an email the other day from Kevin Stephenson, a SQL Server guru from the Microsoft SQL Server Customer Advisory Team, SQLCAT, who prompted my interest in the “Deleting Data Quickly…” series of blogs found on this site. Kevin mentioned the possibility that the method that I chose over theirs for deleting data quickly might have an inherent problem; if a clustered index was heavily fragmented, there is a chance that SQL Server might opt to read pages in a physical order (and not logical) and delete rows outside of my result set. This could also be exacerbated by the use of TOP, which allows a free-flowing of records by the SQL Engine until the threshold has been reached. He adds:
It is theoretically possible for the last row that was added to the table to be deleted, bypassing the ordering requirement.
While he did not divulge the reason for this theory, at this point it really doesn’t matter a great deal. Let’s simply test this to see if we can reproduce this anomaly! Before doing so, however, I want to look back to my original delete. Because it mirrored the table and data setup in their “Fast Ordered Delete” blog, we need to verify that we’re not deleting any good records. Note that in all of my blogs, I am limiting the examples to the constraints outlined in their original testing scenario. When we are reasonably sure of the results, then we will change up the rules to address his point in question. First things first though, let’s look at the original problem that he brought up; run the entire script that follows.
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
IF OBJECT_ID ('t1') IS NOT NULL
DROP TABLE t1;
GO
CREATE TABLE t1 (a int IDENTITY NOT NULL PRIMARY KEY, b char (100))
SET ROWCOUNT 1000000
INSERT INTO t1 (b)
SELECT 'a'
FROM
(SELECT 1 as Col
FROM sys.syscolumns A, sys.syscolumns b, sys.syscolumns c) as Records
go
DECLARE @rows INT
SET @rows=0
There:
DELETE TOP (1000) FROM t1
SELECT @Rows = @Rows + @@ROWCOUNT
IF (@Rows <= 400000)
BEGIN
SELECT @rows +=1
GOTO There
END
GO
Now let’s run the following script to check to see whether or not I have rows that have been deleted outside of the selected range of data, which is driven here by the TOP clause. We will check for gaps in the IDENTITY attribute on my PRIMARY KEY to see if we’re good:
-- Check for gaps in IDENTITY on the table
SELECT *
FROM
(SELECT x.id
FROM
(
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY z.ID) as id
FROM sys.syscolumns x, sys.syscolumns y, sys.syscolumns z
ORDER BY 1
) x
WHERE x.id >= (SELECT MIN(a) FROM t1)
) a
WHERE NOT EXISTS
(SELECT * FROM t1 b
WHERE a.id = b.a)
GO
From the result set returned, I see no gaps in the data, and if you check, the last row inserted is still in the table as well. Again, this example was my code deleting the data set that was outlined in the SQLCAT article; fragmentation is not an issue in their example and likewise neither in mine. The rules set forth in the original write-up did not bring into the equation other items, including fragmentation, so we can say with a degree of confidence that we did not delete any rows outside of our target range of data.
Now, to his question: Can a fragmented clustered index cause unexpected results within the delete? Run the following code and let’s find out.
SET NOCOUNT ON
DROP TABLE dbo.Frag
GO
CREATE TABLE dbo.Frag (Col1 int PRIMARY KEY CLUSTERED, Col2 char (50), Col3 int, Col4 varchar (2))
GO
DECLARE @i int
SELECT @i = 0
WHILE (@i <= 30000)
BEGIN
INSERT INTO dbo.Frag VALUES (@i, 'Howdy!', 1, NULL)
SET @i+=1
END
GO
ALTER TABLE dbo.Frag
ALTER COLUMN Col4 varchar (8000)
GO
UPDATE dbo.Frag SET Col4 = SPACE(8000)
GO
DECLARE @cnt int
SET @cnt=0
LABEL:
DELETE TOP (1000) FROM dbo.Frag
SET @cnt = @cnt + @@ROWCOUNT
IF @cnt <= 20000
GOTO label
GO
We have a heavily fragmented table, aptly named “Frag”, that we can test for the possibility that my delete might remove something that we don’t necessarily want.
-- Check for a gap in the PK. If 30000 shows we deleted the last row inserted...not good.
SELECT *
FROM
(SELECT x.id
FROM
(SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY z.ID) AS id
FROM sys.syscolumns x, sys.syscolumns y, sys.syscolumns z
ORDER BY 1
) x
WHERE x.id >= (SELECT MIN(Col1) FROM dbo.Frag)
AND x.id < 30001
) a
WHERE NOT EXISTS
(SELECT * FROM dbo.Frag b
WHERE a.id = b.Col1
)
GO
If you run the above, you’ll get the same result that I did – no rows outside of my top 20,000 rows were deleted, and checking the max value, it exists in the table as well. I ran this second example numerous times using various combinations of data sets and ‘chunk’ amounts to delete, all with the same result. I also reran this countless times under a wide range of fragmentation (both logical and physical).
In summary, the original delete that I created to remove rows from a data set similar to theirs consistently performed well without a problem. The same can be said of example two – here, they changed the rules up a bit, so I created a new scenario that set the stage for testing their theory. Does this mean that it could never happen? Never say never. If you recall from my first blog, I posted the caveat that this technique should be tested thoroughly, and not to put into production until such tests reveal no underlying problems. (Including rows that get deleted when they weren’t supposed to be deleted).
Now, having gone through all of this, I digress to a point that I made in an earlier blog in the series; speed is great, but speed for deleting data is not important in my opinion. Maintaining user experience is the number one concern in my book, so in truth I have shifted from the notion of deleting quickly to deleting effectively. How good can a delete process be if it escalates locking to the point that users suffer, web pages time out, blocking occurs, and orders and revenue become lost? Bottom line, I continue to stress that a deletion process should run for however long it takes so that normal database processing can proceed without interruption.
Lee
---------------------------------
Maybe we need a series entitled “Deleting Data Slowly...”
