Lee posted on January 11, 2009 22:01
 XML Archive Exercise - Part 2. Let's rat-hole some data!


I started a blog post several months ago on this but haven't had time to explore it further. If you recall, my old boss at the old job (Mindy Curnutt - have no idea where she is these days or what she's doing as we've both left the place...) had an idea of using XML to archive old data. It's not a bad idea and the storage is pretty compact. The problem is that when you try to query it, XML querying is designed to go across rows and not query the XML as efficiently within the row. Having said this, below is something that I worked up today and it's really not a bad way to move the data off to eventually remove from a production system.

First, let's create the table and a few queries:

 

USE AdventureWorks
GO
IF (OBJECT_ID(N'dbo.PurchaseOrderArchive') IS NOT NULL)
    DROP TABLE dbo.PurchaseOrderArchive
GO

CREATE TABLE PurchaseOrderArchive
    (id int IDENTITY (1,1) NOT NULL PRIMARY KEY
    ,Data xml
    ,LogDate datetime
    ,LogUser varchar (20) DEFAULT session_user
    )

GO

INSERT PurchaseOrderArchive (Data,LogDate)
SELECT (SELECT * FROM Purchasing.PurchaseOrderHeader FOR XML AUTO, ELEMENTS) 
        ,GETDATE()

SELECT  * FROM PurchaseOrderArchive

 

We're using AdventureWorks, 2005 version, and are going to try and archive off some old data. Let's first try to get the thing working. I am going to use for this example the records straight out of the PurchaseOrderHeader table; if you saw the other blog post of mine, you can also join tables and archive data as well in combination. For the table, I use Joe Celko's favorite designation for a primary key (IDENTITY attribute), a column called "Data" which will store every record that we're archiving, a log date, and a log user. You can add whatever you like here. As you can see, I do a simple INSERT of the data, followed by a SELECT statement to make sure that I have a row inserted. So far, so good.

 

SELECT Data.query('/Purchasing.PurchaseOrderHeader/PurchaseOrderID') AS PurchaseOrderIDs 
FROM PurchaseOrderArchive 

SELECT Data.value('(/Purchasing.PurchaseOrderHeader/PurchaseOrderID)[3631]', 'int')
    ,Data.value('(/Purchasing.PurchaseOrderHeader/ShipDate)[3631]', 'datetime')
    ,Data.value('(/Purchasing.PurchaseOrderHeader/OrderDate)[3631]', 'datetime')
    ,Data.value('(/Purchasing.PurchaseOrderHeader/TotalDue)[3631]', 'money')
FROM PurchaseOrderArchive

 

 

Now, querying this thing is a bit different than querying normal XML. The first query I simply use the query method for XML to pluck out the IDs. Unfortunately this doesn't do me a whole lot of good. Next, I use the value method for XML. Notice that in order to get a specific record, I need to use the Data.value method for each attribute. Sort of klugey. No need for any type of WHERE clause here, because the row that I want is specified in the singleton [3631] specification for that particular PurchaseOrderID. This is sort of the strange part. I could do a simple query without the singleton syntax, but I only get the first ID! Remember that all of the IDs are in the column and not across rows. This is where this "technique" differs when using XML. Still not a bad idea though, cramming all of the rows of the table into one XML row!

When I create an XML index, I get some quirky behavior as well. The index seems to help out some, but I am not sure if it's original design was intended for what we're doing here. Nonetheless, check out what the results are using the index for our data storage.

CREATE PRIMARY XML INDEX idx_xxx ON PurchaseOrderArchive (Data)
 

image

 

Interesting, eh? I first clear out the cache, and then run the query three times with no index; then I create the index and run the query three times. The speed is 3x without the index, but the reads and writes are also considerably higher. I'd have to compare with other XML queries to see if they follow suit.

Finally, let's alter the table and take only the shipments for the 2000 calendar year, and store those off. The code might look something like so:

 

SELECT COUNT(1), CAST(CAST(OrderDate AS varchar (11)) AS datetime)
FROM Purchasing.PurchaseOrderHeader
GROUP BY CAST(CAST(OrderDate AS varchar (11)) AS datetime)
ORDER BY 2

DELETE PurchaseOrderArchive
GO

ALTER TABLE PurchaseOrderArchive
    ADD ShipYear int

GO
INSERT PurchaseOrderArchive (Data,LogDate, ShipYear)
SELECT (SELECT * FROM Purchasing.PurchaseOrderHeader 
        WHERE YEAR(CAST(CAST(OrderDate AS varchar (11)) AS datetime))= 2002
        FOR XML AUTO, ELEMENTS )
        ,GETDATE()
        ,2002
GO
SELECT * FROM PurchaseOrderArchive
GO
 

Conclusion (if there is any): Use your imagination for expanding what I have done here for storing or archiving off old data. This method is unproven, untested, and may not even work well in a production setting. However, based on what I have here, I can see where you could actually store off a ton of rows very easily in XML and actually query the data; while it's not a mature or well-thought method to grab the data, and probably shouldn't be used for something that you would go fetch often, you can get to your data and display the archived results.

 

Lee

----------------------------------

What great creativity for so early in the morning.

 

 
 
 

Posted in:   Tags:

Comments


January 28. 2009 05:47
nice info and sharing.. thanks a lot Smile

http://free-inventory-management-software.blogspot.com/http://free-inventory-management-software.blogspot.com/

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading




by Lee Everest, M.S.

Search


Ads

Calendar

«  July 2010  »
MoTuWeThFrSaSu
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2010 Lee Everest's SQL Server, etc. weblog