Database Partitioning
Do you have a lot of data in tables that keeps piling up as you do more and more loads? Is there a lot of code in place that uses those tables that you don’t want to mess with? What is the best way to archive the data, or get rid of some of it after it gets too old without any changes to your code? This is where SQL Server database partitioning comes in handy. It allows you to split up the data across multiple storage areas based on the value of a column in your table and then clean up the partition holding the data you no longer need. Other benefits include faster data access if your code takes advantage of the column that the table is partitioned on. In this blog post I introduce you to how to partition an existing large table in 3 steps, how to delete data from portions, and briefly discuss indexes on partitioned tables.
Step 1: Create the Partition Function
Assume I want to partition the table MyTable into 3 different partitions based on the column LoadID. The first thing I would do is create the Partition Function. A Partition Function serves two purposes. First, it defines the number of partitions that a table will have. Second, it defines the boundaries of each partition. This is the syntax for creating the Partition Function:
CREATE PARTITION FUNCTION PF_ LoadID (int)
AS RANGE LEFT FOR VALUES (10, 20);
PF_ LoadID is the name of the Partition Function, and “int” is the data type of the partition column. 10 and 20 are the partition boundaries for the data values store in the table.
The Partition Function defined above will result in the creation of 3 partitions.
| Data value |
{0…. 10} |
{11 …20} |
{21 …MAX} |
| Stored in |
Partition 1 |
Partition 2 |
Partition 3 |
- Partition 1 will store all rows of data where the LoadID value
- Partition 2 will store all rows of data where the LoadID value > 10 and LoadID value
- Partition 3 will store all rows of data where the LoadID value > 20
But what if you wanted the partition boundaries for the data values to be defined as follows instead?
| Data value
| {MIN…. 9} |
{10 …19} |
{20…MAX} |
| Stored in |
Partition 1 |
Partition 2 |
Partition 3 |
- Partition 1 will hold any row of data where the LoadID < 10
- Partition 2 will hold any row of data where the LoadID >= 10 and LoadID < 20
- Partition 3 will hold any row of data where the LoadID >= 20
The answer would be to modify the syntax to use RANGE RIGHT instead of RANGE LEFT when creating the Partitioning Function as shown below.
CREATE PARTITION FUNCTION PF_ LoadID (int)
AS RANGE RIGHT FOR VALUES (10, 20);
- RANGE LEFT means that the boundary data value falls to the left of the boundary.
- RANGE RIGTH means that the boundary data value falls to the right of the boundary.
Step 2: Create the Partition Scheme
Now that you’ve created the Partitioning Function, you must create the Partitioning Scheme. The Partitioning Scheme is used to define which filegroups the partitions will be stored on. You can either map the each partition to a different file group, or you can map all partitions to the PRIMARY filegroup as follows. I can cover filegroups in a different blog post. If you don’t have control over the disk your instance is installed on, use PRIMARY filegroup as shown below.
CREATE PARTITION SCHEME PS_LoadID
AS PARTITION PF_LoadID ALL
TO ([PRIMARY]);
GO
- PS_LoadID is the Partition Schme Name.
- PF_LoadID is the Partition Function which the Partition Scheme depends on.
- ALL is to specify that all of the Partition Function partitions map to one filegroup.
- PRIMARY is to specify that all the partitions should be stored on the primary filegroup.
Step 3: Partition Large Tables
Now that you’ve created the Partition Scheme using the Partition Function, you can partition your large table.
A) Make sure that your table is not already partitioned
First we’ll start by looking at how many partitions the existing table has.
SELECT
partition_id
, [object_id]
, partition_number
, [rows]
FROM sys.partitions
WHERE object_id = OBJECT_ID('MyTable')
if your table isn’t partitioned, you should get 1 partition back.
B) Drop the primary key and the clustered index
From there, assuming the clustered index is on your primary key, you need to drop the primary key constraint. If your primary key is not the clustered index, you still need to drop the PK as well as the other clustered index that is on your table.
ALTER TABLE MyTable
drop CONSTRAINT [PK_CONSTRAINT] -- PRIMARY KEY CLUSTERED CONSTRAINT
C) Recreate the primary key and the clustered index with the partitioned column as a participating column
After that, you need to recreate the clustered primary key constraint on the primary key column AND the partition column. You also need to make sure that the index is created on the partition scheme you created in step 2.
If your primary key is not the clustered index, you still need to recreate the primary key as described above, as well as recreate the clustered index the same way.
ALTER TABLE MyTable
ADD CONSTRAINT [PK_CONSTRAINT]
PRIMARY KEY CLUSTERED
(
MyTableUniqueID -- Existing PK column
, LoadID -- Partition column
)
ON PS_LoadID(LoadID) -- Partition Scheme created in step 2
D) Verity that the table got paritioned
Lastly, let’s check to validate that the table was split up into the 3 partitions defined as part of the Partition Function.
SELECT
partition_id
, [object_id]
, partition_number
, [rows]
FROM sys.partitions
WHERE object_id = OBJECT_ID('MyTable')
Deleting Data from a Partition
You don’t drop the actual partition that has data; you switch it out with an empty one that has the same boundaries defined. SQL Server makes this very easy to do using keyword SWITCH.
A) Create a Table to temporarily switch data into
The first thing you need to do is create a table that matches the source table you want to drop the partitioned data from. The new table should be created on the same partition scheme as the source table so you can switch the data back and forth without violating any boundaries or constraints in the transaction.
CREATE TABLE [dbo].[temp_partition_data](
[LoadID] [int] NOT NULL,
…
) on PS_LoadID (LoadID);
B) Determine the partition number that you want to drop the data from
After the table to switch data into has been created successfully, the next step is to determine which partition has the data you want to drop. Pick a value that you know falls within the boundaries of the partition you want to drop data from and have SQL Server return the partition number that data is stored in. This can be achieved as follows.
-- The value 15 falls in the bounderies
-- of the partition I want to drop
declare @partitionNumber int
SELECT @partitionNumber = $PARTITION.PF_ LoadID('15');
SELECT @partitionNumber as [Partition Number]
Assuming that the Partition Function (RANGE RIGHT) used was the one we defined above, the last select statement should return Partition Number 2
C) Validate that the source partition has data, and the target partition is empty
First, check the number of partitions on the source table.
SELECT
partition_id
, object_id
, partition_number
, rows
FROM sys.partitions
WHERE object_id = OBJECT_ID('[schema].[table]')
This should return 3 partitions. Make sure that the partition you want to switch out with a blank partition has a row count > 0 (below, partition 2 has 1,666,338 rows).
| partition_id |
object_id |
Partition_number |
Rows |
| 72057594130399200 |
80055371 |
1 |
1666338 |
| 72057594130464700 |
80055371 |
2 |
1666338 |
| 72057594130530300 |
80055371 |
3 |
1666338 |
Now check the number of partitions the table temp_partition_data has. All partitions should have a row count = 0 (this is required to do the switch).
SELECT
partition_id
, object_id
, partition_number
, rows
FROM sys.partitions
WHERE object_id = OBJECT_ID('temp_partition_data')
This should return 3 empty partitions.
| partition_id |
object_id |
Partition_number |
Rows |
| 72057594130268100 |
288056112 |
1 |
0 |
| 72057594130333600 |
288056112 |
2 |
0 |
| 72057594130595800 |
288056112 |
3 |
0 |
D) Switch the data into the blank partition
After we determine the partition number we want to drop, we can switch partition 2 in the source table ([schema].[table]) with the empty partition 2 from the table temp_partition_data that was just created. This can be done as follows.
alter table [schema].[table]
switch partition 2 to temp_partition_data partition 2
If we run the same check queries above for both tables, we should get the following results respectively.
Table temp_partition_data:
| partition_id |
object_id |
Partition_number |
Rows |
| 72057594130268100 |
288056112 |
1 |
0 |
| 72057594130333600 |
288056112 |
2 |
1666338 |
| 72057594130595800 |
288056112 |
3 |
0 |
Table [schema].[table] (Source table):
| partition_id |
object_id |
Partition_number |
Rows |
| 72057594130399200 |
80055371 |
1 |
1666338 |
| 72057594130464700 |
80055371 |
2 |
0 |
| 72057594130530300 |
80055371 |
3 |
1666338 |
Note that if you are switching data into a destination table, then you must recreate any of the non clustered indexes that were on the primary file group, and any foreign keys that need to point back to your destination table. Also note that if you are switching data out of a table, then you must drop all of the non clustered indexes created on the primary file group, and any foreign keys pointing to your table before you switch any data out.
E) Drop the table that the data was switched into
After you are satisfied with the switch, you can drop the table temp_partition_data to fully get rid of the data you wanted to drop.
Creating Indexes, should you create them on the partition scheme or on the PRIMARY filegroup?
Clustered indexes must be created on the partition scheme and can’t be created on the PRIMARY file group (i.e. all clustered index must be partitioned indexes). All clustered indexes in the database must be created to include the partition column as part of the clustered index.
Non-clustered index can either be built on the partition scheme or on the PRIMARY file group. Indexes that are created on the partition scheme CAN’T be rebuilt ONLINE. Indexes created on the PRIMARY file group can be rebuilt ONLINE. Index created on the partition scheme can be switched in and out of partitions along with their corresponding data. Indexes that are built on the PRIMARY file group must be rebuilt if their corresponding data is switched in and out of partitions. If the partition column is not included as part of the non-clustered index definition, SQL Sever will automatically add it. When an index has the partition column as a participating column, you achieve index partitioning alignment. This can be used to guide you on how to create your indexes.