Where I am consulting we’re getting ready to set up a full-blown partitioning design for a fact table that will hold data from, roughly, 50 countries. Someone asked the other day about locking and DML so I experimented in another post on the new (LOCK_ESCALATION = AUTO) feature for SQL Server 2008. Worked great doing simultaneous DML operations to each partition scheme, definitely good news especially for the setup that we’re going since each country may be reading, writing, loading data at various points throughout the day. I was able to insert chunks of data into one partition while updating all of the data in another, and deleting data from yet another while witnessing no problems in locking or blocking which is what was anticipated. I also skewed the data greatly as well, since previous versions you might expect escalated locking situations if you acted upon the greater majority of the data in a table, on up to a table lock.
Another question asked - how well would indexing work and are we certain to experience fewer reads and writes, leading to fewer I/Os, for the “current” partitions which hold smaller data sets? (The setup may have an archive partition, a current partition, and various partitions by year in between so that queries would have to traverse the most recent, say, six months of data. Probably a common setup in a lot of shops). Because I haven’t tested this and really haven’t read a lot lately about partitioning – I know there is a ton of good stuff out there, such as the article by Kimberly Tripp – I just dove with an example to verify.
I set up the following and inserted lots of nice data using my new VS 2010 install and the Data Generator. Here’s what I did:
USE Test
BEGIN TRY
DROP TABLE dbo.PartitionReadsTest
END TRY
BEGIN CATCH
END CATCH
GO
CREATE PARTITION FUNCTION funct_Partition_yr(INT)
AS RANGE RIGHT FOR VALUES (2007,2008,2009,2010)
GO
CREATE PARTITION SCHEME partition_yr
AS PARTITION funct_Partition_yr TO ([PRIMARY], secondary, tertiary, quaternary, quinary)
GO
CREATE TABLE PartitionReadsTest
(
[Year] int NOT NULL
, FName varchar (40)
, LName varchar (40)
, BirthDate datetime NOT NULL
, Country nvarchar (4) NOT NULL
) ON partition_yr([Year])
GO
CREATE CLUSTERED INDEX cl_idx_PartitionReadsTest ON PartitionReadsTest (Year)
ON partition_yr(Year)
GO
Here’s the distribution of my data:
| 2007 |
812064 |
| 2008 |
181261 |
| 2009 |
6468 |
| 2010 |
207 |
The following query gives me this portion of the actual execution plan:
SELECT COUNT(1)
FROM PartitionReadsTest
WHERE [Year] = 2007
AND BirthDate between '1/1/1960' and '2/1/2007'
Table 'PartitionReadsTest'. Scan count 3, logical reads 10078, physical reads 186, read-ahead reads 9152, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Creating this index improves the query:
CREATE NONCLUSTERED INDEX idx_PartitionedIndex ON PartitionReadsTest (year, BirthDate)
Table 'PartitionReadsTest'. Scan count 1, logical reads 805, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
If I create the index on the partition, it appears that I get fewer rows to traverse and a better subtree cost:
CREATE NONCLUSTERED INDEX idx_PartitionedIndex ON PartitionReadsTest (year, BirthDate)
ON partition_yr (Year)
GO
Definitely worth looking into indexing of these partitions a bit further.
Oh sorry, got off subject here. For certain SQL Server does do less work on a partitioned table when selecting from one with fewer rows, which answers their original question.
Lee
------------------------
Oh, tarter sauce.

1192836d-d623-44f6-a7d6-4602a9660bbb|0|.0