Admin posted on September 18, 2009 15:29

Part three of this series of posts on deleting data will cover monitoring the rows that get deleted, and part four goes over server monitoring (locking, waits, etc.). If you haven’t gone over the first two in the series, you might want to now. Make sure and run these scripts in the order that they appear, and for crying out loud please don’t use any of these scripts on any production table until you have tested and tweaked the process.  I don’t want a mailbox full of nasty complaints that you’ve completely removed all of the data from all of the tables in your production backups and that you have requested my home address.  This is a proven technique that I have used in production, but it is only a replica – I most likely left out something important.

 

 

STEP (1) -   CREATE OUR DATA
Let’s create our test data by running the script that we previously had.  I have this set to 25 million rows.

 

SET NOCOUNT ON

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

 

IF OBJECT_ID ('tempdb..##t1') IS NOT NULL

    DROP TABLE ##t1;

GO

CREATE TABLE ##t1 (a int IDENTITY NOT NULL PRIMARY KEY, b char (10))

go

SET ROWCOUNT  25000000

INSERT INTO ##t1 (b)

SELECT 'a'

FROM

    (SELECT 1 as Col

    FROM sys.syscolumns A, sys.syscolumns b) as Records

GO

SET ROWCOUNT 0

 

 

 

 

STEP (2)  - BUILD OUR CONTROL TABLE

This table, dbo.Chunksize, will hold values that will drive our delete script and help out with our reporting and monitoring.  The trigger has been placed on the table allow only one row in the table;  if more than one row were inserted, the calculations for reporting would be jacked.

 

 

IF OBJECT_ID('Chunksize') IS NOT NULL

       DROP TABLE Chunksize

go

CREATE TABLE  Chunksize

        (id int identity (1,1) NOT NULL

        ,Chunk int

       ,Target int

       ,TotalBeginningRowcount int

       ) 

GO

CREATE TRIGGER  [tr_OneRowOnChunksize]

ON Chunksize

FOR INSERT

AS

BEGIN

       DECLARE @Rows int

 

       SELECT @Rows = COUNT(1)

       FROM Chunksize

 

       IF (@Rows != 1)

       BEGIN

              RAISERROR ('Error!!! One row allowed to avoid invalid results!', 15, 2)

              ROLLBACK

              RETURN

       END

END

GO

   

 

-- Initial insert of Chunksize
INSERT Chunksize (Chunk, Target, TotalBeginningRowcount)

SELECT   1 as InitialChunk -- Can change this later as script runs

        , 7000000 as Target -- Total # of rows we want to delete.

        , (SELECT COUNT (1) FROM ##t1)  -- Beginning rowcount in table.

GO

 

SELECT * From Chunksize

 

GO

 

 

This is the initial seed to the table; a value of 1 for the initial chunk of data to be deleted in each iteration of the loop, a value of our target number of rows that we wish to delete, and a beginning count of rows in the table for later reporting calculations. So, let’s say that you had a table of 150 million rows, and you wanted to delete 30 million rows, you would put the target to 30 million, and set the initial chunk to whatever you like (I usually start with 1, and then gradually move this up.  If you start too big, you could begin blocking users right off that bat…something to avoid).

 

 

STEP (3)   -  Build the reporting table and Seed the Results table

 

Run these two scripts together to seed the reporting table.


IF OBJECT_ID ('Results') IS NOT NULL

       DROP TABLE Results

go

CREATE TABLE Results (id int identity (1,1) NOT NULL

                     ,Chunk int

                     ,TimeRecord datetime

                     ,Rows decimal (10,2)

                     )

GO

 

INSERT INTO Results

SELECT 1 as Chunk, GETDATE(), (SELECT COUNT(1) FROM ##t1)

 

GO

  

 


STEP (4) – Start the delete statement   

 

Do this step in a separate window.

 

DECLARE @Rows int

        ,@Chunksize int

              ,@Target int

 

SELECT @Rows=0

SELECT @Chunksize = Chunk FROM Chunksize

SELECT @Target = Target FROM Chunksize

 

There:

DELETE TOP (@Chunksize) FROM ##t1

SELECT @Rows = @Rows + @@ROWCOUNT

IF (@Rows < @Target)

BEGIN

    SELECT @Chunksize = Chunk FROM Chunksize WHERE id=1

       GOTO There

END

GO 

 

 

 

STEP (5) -  Report Monitoring and adjusting the Chunk size

 

Do this step in a separate window. 

 

DECLARE @Chunk int

SET @Chunk=1

 

IF @Chunk!= (SELECT Chunk FROM Chunksize)

       UPDATE Chunksize SET Chunk=@Chunk

 

-- DELETE Results

IF (DATEDIFF (ss,  (SELECT MAX(TimeRecord) FROM Results), GETDATE())> 3)

       or ((SELECT COUNT(1) FROM Results)  =0)

BEGIN

       INSERT INTO Results

       SELECT @Chunk, GETDATE(), (SELECT COUNT(1) FROM ##t1)

       PRINT 'Sucess! Record added to dbo.Result table'

END

ELSE

       PRINT 'Error!  Attempting to poll Chunksize too often'

 

 

WITH CTE (Number_Of_Seconds, Rows_Deleted, Rows_Deleted_Per_Second

              ,Chunk_Size, Target, TotalBeginningRowcount, CountInTable)

AS (

       SELECT DATEDIFF(ss, b.TimeRecord, a.TimeRecord) as Number_Of_Seconds

              , b.Rows-a.Rows as Rows_Deleted

              ,(b.Rows-a.Rows )/DATEDIFF(ss, b.TimeRecord, a.TimeRecord)

AS Rows_Deleted_Per_Second

              , b.Chunk as Chunk_Size

              ,(SELECT Target from Chunksize) as Target

              ,(SELECT TotalBeginningRowcount FROM Chunksize)

AS TotalBeginningRowcount

              ,(SELECT COUNT(1) FROM ##t1)  as CountInTable

       FROM Results b

       INNER JOIN Results a

              ON b.id = a.id-1

       )

SELECT Chunk_Size

       , AVG(Rows_Deleted_Per_Second) AS Average_Rows_Deleted_Sec

       , AVG(Rows_Deleted_Per_Second)*60 as Average_Rows_Minute

       , 1000000/AVG(NULLIF(Rows_Deleted_Per_Second, 0)) 

AS [Delete:Seconds_Per_Million]

       , 1000000/AVG(NULLIF(Rows_Deleted_Per_Second,0))/60

AS [Delete:Minutes_Per_Million]  

       , Target - (TotalBeginningRowcount - CountInTable)

AS RemainingRowsFromTarget

       , CAST((Target - (TotalBeginningRowcount - CountInTable) )

              / (AVG(Rows_Deleted_Per_Second)*60 ) AS int) AS MinutesTillDone

FROM CTE

GROUP BY Target, Chunk_Size, CountInTable, TotalBeginningRowcount, CountInTable

 

 

As our script runs and deletes rows, you will run this script to insert our status into the Results table and modify our chunk size if desired.  The attributes for the Results table are:


            Chunk – the chunk size that has been set, initially to 1, that will be the parameter for TOP

            TimeRecord – this is  used in calculations to follow

            Rows – the number of rows that are in the table, again for more calculations

 

The CTE reports on how the delete is doing, at what rate the deletes are doing grouped by chunks size, and the estimated number of rows being deleted, as well as the remaining time left in the “MinutesTillDone” column.  After a few seconds, I run this and receive the following:

  

 

 

 

 

We can see (barely) that for a chunk size of 1, the average rows deleted/sec is around 667, with the other calculations based on this valued. MinutesTillDone is based on the delete size, target, and how many rows are in the table as can be seen in the CTE.  Let’s set the chunk in script # 5 to 5, and view the results:

  

 

 

 

Let’s up it to 10, 100, 1000, and finally, 5000.

  

 

 

 

 

Notice that with an increase in chunk size comes a big increase in Average Rows Deleted/ Sec and Minute, and a big decrease in MinutesTillDone, all as we would expect. In part four, we will look to see how we can know what to set this chunk size to in order to minimize the blocking of user processes.

 

Lee

 

------------------------------

 

 Is "America's Next top Model" on tonight???

 

Smileys


Comments


September 18. 2009 16:25
Deleting Data Quickly in SQL Server, Part 3

You've been kicked (a good thing) - Trackback from DotNetKicks.com

http://www.dotnetkicks.com/database/Deleting_Data_Quickly_in_SQL_Server_Part_3http://www.dotnetkicks.com/database/Deleting_Data_Quickly_in_SQL_Server_Part_3


September 26. 2009 08:33
Deleting Data Quickly in SQL Server, Part 3

Thank you for submitting this cool story - Trackback from Servefault.com

http://servefault.com/Deleting-Data-Quickly-in-SQL-Server-Part-3http://servefault.com/Deleting-Data-Quickly-in-SQL-Server-Part-3

Comments are closed

by Lee Everest, M.S.

Search


Ads

Calendar

«  July 2010  »
MoTuWeThFrSaSu
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2010 Lee Everest's SQL Server, etc. weblog