A straightforward look at the PIVOT Data Flow transformation
PIVOT and UNPIVOT are relational operators in SQL Server for, essentially, reshaping the way the data resides in a table. Here, we look at PIVOT, but do so using the SQL Server Integration Services (SSIS) PIVOT Data Flow Transformation. I have another blog somewhere that shows how to do it in TSQL, so you can look at this later for another example.
Think of PIVOT similarly to when you have created a pivot table in Excel or Access. (If you’ve never created one, no worries. Just continue). Remember how you took values in a spreadsheet and then used the pivot table wizard to create a summarized view of the data in a grid format? This is essentially the same operation, except we’re doing it in SQL Server. When we think of PIVOT, we should think of it as an operation to rearrange data so that it reads as “something” BY “something”, such as:
- Sales BY Quarter
- Shoes BY Store
- Work Hours BY Location
- DUIs BY Number of Students
These represented labels in the pivot table that you created and served as the x and y axis for the table; inside of the table were cells that held the actual values that existed. Think of this scenario to compare to what we are doing here – we have multiple entries as rows in the table, but we want to take the values and place them as data in columns. Here’s what this might have looked like in Excel:
Now we create the pivot table:

In SQL Server, we want to do the same operation; we want to use SSIS to handle the pivot for us as opposed to TSQL. First, let’s set up the tables and data that we’ll use. Run the below in SQL Server Management Studio.
CREATE TABLE Sales
(Yr int
,Qtr char(2)
,Sales money
,CONSTRAINT SaledPK PRIMARY KEY CLUSTERED (Yr, Qtr)
);
GO
CREATE TABLE NewSales
(Yr int
,Q1 money
,Q2 money
,Q3 money
,Q4 money
,CONSTRAINT NewSalesPK PRIMARY KEY CLUSTERED (Yr)
);
GO
INSERT Sales VALUES ( 2009, 'Q1', 50), (2009, 'Q2', 20), (2009, 'Q3', 40), (2009, 'Q4', 100)
GO
SELECT * FROM Sales
So far, so good. A very straightforward example, we have our four rows inserted into a table, similar to what they look like in the spreadsheet, above. Notice that our NewSales table that we created more or less matches the pivot table that we created in Excel. Can you see how the records were “role played” as rows? Now we are going to flip them into columns? We will transform them into Sales BY Quarter for a given year rather than records in the Sales table. Note how our primary key has changed; in Sales, a row was uniquely identified by the year and quarter for the sales amount. In our NewSales table, we uniquely identify a row only by the year. Keep this in mind when you are using PIVOT.
After running this code, open up SSIS (I’m using SQL Server 2008 and SSIS 2008 by the way), add a new package, and add the following tasks to the Data Flow tab of the package. Rename the tasks, and for the Sales task, connect to the database where you just ran the code to create the tables, and map to the Sales table. So, for this step, I’ve added an OLE source (wired up to the Sales table), a Pivot task, and an OLE destination task called New Sales. I have also connected to the table in the Sales task, so right now I have no red “x” in the box; the others do for now.
Drag the green line to connect the Sales task and the Pivot task. Double click the task, and go to the second tab, which is called “Input Columns”. Check all of them, and then move to the third table called Input and Output Properties. Expand the + signs for Pivot Default Input and Pivot Default Output. This area is the workspace where you will make the mappings from rows to columns. Think of the Input at this point as the Sales table – rows entered in a table. The Pivot Default Output will be paired to the final destination, which is the pivoted table.
Click on the Yr, Qtr, and Sales input columns, paying special attention to the properties box on the left. The two properties that we are concerned with for each of these input columns for now are 1) LineageID and 2) PivotUsage. Find those now for each.
PivotUsage is an important property – it tells SSIS how to treat the data and what its role is during the transformation process. I’ve copied the table from MSDN for you to review the valid values for this property.
|
Option
|
Description |
| 0 |
The column is not pivoted, and the column values are passed through to the transformation output. |
| 1 |
The column is part of the set key that identifies one or more rows as part of one set. All input rows with the same set key are combined into one output row. |
| 2 |
The column is a pivot column. At least one column is created from each column value. |
| 3 |
The values from this column are placed in columns that are created as a result of the pivot. |
So, for ours, we won’t have a 0 option, because we’re not passing anything through without identifying what it is, so to speak. For Option 1, this is our key for our row, which we already know to be the Yr column. Option 2 is the actual value that will be turned into a column. Notice that it gives us a hint to create one or more columns for our values…so in this instance we will be taking our generic QTR column and moving the values there into Q1, Q2, Q3, and Q4. Finally, Option 3 is the actual value (ours were 50, 20, 40, and 100) that gets placed inside of the “grid” if you will as we did above in our Excel pivot table, or in the row when we do the transformation in SSIS. In summary here, enter for Yr a value of 1, and for Qtr add the value of 2, and for Sales the value of 3 for your PivotUsage values.
LineageID is the other property that we are concerned with. This simply identifies the source column so that we can reference it when we create the destinations (or Pivot Default Output columns).
Now, for the fun part – we’ll create the transform columns to match the destination table (NewSales) and map the source columns to these destination columns. Inside of the Pivot Default Output click on the “Output Columns” folder and hit the Add Output button. (Quick factoid for you – when you add columns, always place your cursor here so that each added column will fall in the order that you added them. This allows you to organize them to match your table in a visual fashion. Adding column outputs with the cursor anywhere else will cause them to be added out of sequence). Rename this one Yr. Move your cursor back up to the Output Columns folder and click the Add Output button again. This one rename to “Q1”. Continue adding the others until yours looks like so:
Go back to the Yr column in the Output Columns list. We need to now map the source value. This is done by setting both the PivotKeyValue and SourceColumn properties. Easy! First, the PivotKeyValue. Remember that LineageID in from the Input Columns, above? That value goes here. So for Yr, assign a PivotKeyValue of Yr and a SourceColumn value of your value from the Input Columns section for Yr. (Mine happens to be 40).
For the columns that you created (Q1, Q2, etc.) this gets a bit trickier but it’s not too terribly difficult. Remember your values that you flip to be columns? Those values are the literal identifiers that SSIS uses to match values to. So for Q1, you place the literal value from the table – Q1 – inside of the PivotKeyValue. The source column is the LineageID from the source, so you get this by going back to the Input Columns and clicking on Qtr, which gives me a LineageID of 43. I will use this very same LineageID for all of my new columns since they all come from the same source column QTR, with the PivotKeyValue telling SSIS that when it comes upon a Qtr value of Q1 it places that into the Q1 column; when it finds a value of Q2 in the Qtr source, it places that into the column Q2, and so on. Here’s what the mappings should look like:

We need to do nothing with the Sales column since we already told SSIS that it is an Option 3 type, so it knows to place the actual value into the created columns that we made for the Output. Finally, connect the Pivot task to the OLE destination that you created. Map the columns to the NewSales table and then run the package. You should get a result set similar to this:
Note that you might have to use both the Sort task and Aggregate task in case you have values in the source table that need to be grouped prior to sending to the Pivot task; Pivot likes distinct values and these tasks help you sum them up and sort them. So, let’s say that you had a BizTalk or MSMQ process that sends event data from hundreds of different stores throughout the day. You might have 100 ProductA records from each of the stores, so this would be a candidate to first group up before Pivoting in order to get the total sales for ProductA from all stores combined.
Lee
--------------------
“There's no way, *no* way that you came from *my* loins. Soon as I get home, first thing I'm gonna do is punch yo mamma in da mouth! “
