Here’s a how and why for using the PIVOT operator


Two of the slickest new features in TSQL syntax for SQL Server 2005\2008 are PIVOT and UNPIVOT. These have been in the ANSI standards now for awhile as far as I know, but are late arrivals here to SQL Server. PIVOT is available to us for taking rows and converting them to columns; UNIPIVOT does the opposite, and it allows us to take columns values and create rows out of them. In a previous blog, I showed one example of using UNPIVOT for the purpose of finding the MIN and MAX value across a set of columns for a given row. I’ve yet to do anything on PIVOT, hence the reason for this article/blog.

“Who am I? Why am I here?”
As I always do, I first verified that Books Online has no worthwhile example, for PIVOT in this case. I then looked at a few blogs and other articles to see what was currently out on Google! And, as per usual, I didn’t find much. I always seem to locate information that simply uses the syntax without giving a good reason as to why or when you may use the feature, or I find a simple regurgitation of Books Online. So, here’s my entry for PIVOT – hope you like this one.


Nuts and bolts
As mentioned, we take PIVOT into consideration for moving rows to columns. Why or where might we use this? For this example, suppose that we have a table where data is entered after an event occurs. We have the notion of plants, and each plant reports to us four metrics each day; hours worked, temperature, weather, and downtime. When a plant sends data, they simply send a row. If they don’t send a row, the plant did not generate a value for that day, or someone at the plant forgot to submit the data. We see this type of table oftentimes in the mainframe or AS/400, by the way. Rather than adding columns when new metrics are needed, they just add rows to the table.

This can be described as a “role playing” the table. (I use the term “role playing” in a data warehousing sense as Ralph Kimball might. Dates and times in those installations are referenced in many places, but the domain of values is the “Time Dim” for each; hence it exists to play several roles). Note: rows entered into a table as I describe here is not a proper design in my opinion, but sometimes as SQL professionals we must “dance with the one that brought us” and learn how to query the data that we’re given. After all, in the real-world, customer databases are not as well planned as the Northwind or AdventureWorks databases. This example happens to be something similar to a JD Edwards implementation that I’ve worked with recently. Other scenarios such as this might be found in any mainframe implementation, or maybe even a BizTalk or message queue implementation, where messages flow as single rows to a table as they occur either in event or batch form.


“Danny isn’t here, Mrs. Torrance”
Let’s set up the table and data, and then examine it a bit closer.

 

DROP TABLE PlantMetrics;
GO
CREATE TABLE PlantMetrics
    (PlantID int
    ,TransDt datetime
    ,Metric char (4)
    ,Value nvarchar (10)
    ,CONSTRAINT pk_plmt PRIMARY KEY CLUSTERED (PlantID, TransDt, Metric) 
    );
GO
INSERT PlantMetrics 
VALUES (1,'7/1/09', 'MHRS', '24')
        ,(1,'7/1/09', 'WTHR', 'Sunny')
        ,(1,'7/1/09', 'TEMP', '88')
        ,(2,'7/1/09', 'TEMP', '94')
        ,(1,'7/2/09', 'MHRS', '36')
        ,(3,'7/2/09', 'DOWN', '7')
GO


As described, we have a table that has entries entered into it as events if you will, as rows. The Metric attribute describes the type of values that is entered, and the Value attribute represents the value for that metric. In this design many metrics can be entered, so a generic nvarchar value will be assigned as the data type for the generic Value attribute, since we might store a numeric value or a description. The data looks like so (click on the images for a larger picture):

 

SELECT *
FROM PlantMetrics
GO


image

 

The reason for our solution here is that you would most likely rather work with the event data as columns for a given row’s primary key; from there, use it in reporting or other procedures much easier than it currently exists.

Jumping ahead to see what the data will look like first before we get into the code, look at the next picture. You can see that we have moved the data to columns from rows. For each primary key of the current table, I have created a new primary key (PlantID and TransDt). All of the metrics found have been added as values, and the others as NULLs since they did not exist for that Plant/Date/Metric combination. As you can see, this data layout is now easier for the user to consume.


image


Back in the Good-Old days
What did we do before PIVOT? I thought up two ways to get the same thing, both require more coding and complexity. (There are others but I didn’t take the time to add them here). The first uses multiple OUTER JOINS and the second uses UNION ALL multiple times and then batches them up via a derived table.


Multiple outer join example

SELECT p.PlantID
    , P.TransDt
    , MHRS.Value AS MHRS
    , TEMP.Value AS TEMP
    , WTHR.Value AS WTHR
    , DOWN.Value AS DOWN
FROM PlantMetrics p
LEFT OUTER JOIN
    (SELECT * FROM PlantMetrics WHERE Metric = 'MHRS') AS MHRS 
        ON p.PlantID = MHRS.PlantID AND p.TransDt = MHRS.TransDt
LEFT OUTER JOIN
    (SELECT * FROM PlantMetrics WHERE Metric = 'TEMP') AS TEMP 
        ON p.PlantID = TEMP.PlantID AND p.TransDt = TEMP.TransDt
LEFT OUTER JOIN
    (SELECT * FROM PlantMetrics WHERE Metric = 'WTHR') AS WTHR 
        ON p.PlantID = TEMP.PlantID AND p.TransDt = TEMP.TransDt
LEFT OUTER JOIN
    (SELECT * FROM PlantMetrics WHERE Metric = 'DOWN') AS DOWN 
        ON p.PlantID = TEMP.PlantID AND p.TransDt = TEMP.TransDt                
GROUP BY P.PlantID, P.TransDt, MHRS.Value, TEMP.Value, WTHR.Value, DOWN.Value    
ORDER BY PlantID, TransDt        
GO


Multiple UNION ALL example

SELECT PlantID
    , TransDt
    , MAX(MHRS) as MHRS
    , MAX(TEMP) AS TEMP
    , MAX(WTHR) AS WTHR
    , MAX(DOWN) AS DOWN
FROM (
    SELECT PlantID, TransDt, Value as MHRS, NULL AS WTHR, NULL AS TEMP, NULL AS DOWN
    FROM PlantMetrics
    WHERE Metric = 'MHRS'
    UNION ALL
    SELECT PlantID, TransDt, NULL AS MHRS, Value AS WTHR, NULL AS TEMP, NULL AS DOWN
    FROM PlantMetrics
    WHERE Metric = 'WTHR'
    UNION ALL
    SELECT PlantID, TransDt, NULL AS MHRS, NULL AS WTHR, Value AS TEMP, NULL AS DOWN
    FROM PlantMetrics
    WHERE Metric = 'TEMP'
    UNION ALL
    SELECT PlantID, TransDt, NULL AS MHRS, NULL AS WTHR, NULL AS TEMP, Value AS DOWN
    FROM PlantMetrics
    WHERE Metric = 'DOWN'
    ) as x
GROUP BY PlantID, TransDt
ORDER BY PlantID, TransDt
GO

 

Again, I can think of at least two other ways to do this flip, but you get the idea. Now let’s try PIVOT to get the same thing. Nice, huh?


SELECT * FROM PlantMetrics
PIVOT (
        MAX(VALUE)
        FOR Metric IN (MHRS, WTHR, TEMP, DOWN))  X
ORDER BY PlantID, TransDt 
GO


Study the code. In our example MAX doesn’t mean a lot, but the syntax requires that we use an aggregation here. So, for each type of Metric in the domain of our values in the list in parenthesis, we aggregate the value; each of the values in the list become column values, and that value gets placed inside of each column in the list. Fairly straightforward in my opinion after you get used to it.


Conclusion
UNPIVOT rocks, and PIVOT rulz! This is an amazing alternative for flipping rows to columns and I hope that you can use this somewhere down the road.


Lee Everest

 

------------------------------

You weren’t listening, were you?

 

huh

 

 



Posted in:   Tags:

Comments


July 24. 2009 13:54
thanks for this nice info, it's so useful for me.

http://sulumits-retsambews.com/http://sulumits-retsambews.com/


United States Lee
September 15. 2009 06:56
Correction!  Although late arrivals, they are NOT ANSI standards. As always, if you're programming towards ANSI with even the slightest chance of moving your database someday,  forget using PIVOT and UNPIVOT.

http://www.texastoo.com/http://www.texastoo.com/


Israel RH
February 7. 2010 03:50
Hi there, Thanks for the clear article.
I would love to see a scenario in which you have to put two fields from the original table on the column (as parent-child). For example, put each amount on both USD and Euro.
Thanks again.
RH

http://www.olap.co.il/http://www.olap.co.il/

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading




by Lee Everest, M.S.

Search


Ads

Calendar

«  July 2010  »
MoTuWeThFrSaSu
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2010 Lee Everest's SQL Server, etc. weblog