I’m doing an interesting project in work where I am “scraping” data off of word documents, and may do a few blogs on it. Probably not the best way to handle this sort of thing, but it works well without the cost of a third-party tool that transforms data to XML, and the code is fairly straight forward. In my task, the data from a series of .doc and .docx templated files will be moved to a .csv, imported to SQL Server, cleaned, moved to a dimension table, and finally, fed to a cube in Analysis Services. Pretty cool eh? I think so.

One thing about doing this is that you’ll have to have either 1) Microsoft Office installed on the server, or 2) the Primary Interop Assemblies Redistributable, or PIA, installed on the server, both so that you can take advantage of the Office (in my case Word) methods to fetch the data off of the documents if you plan on scheduling and running from the server itself. Most servers normally don’t have Microsoft Office installed – there’s no reason – and to avoid burning a copy of office that costs money, you can add these assemblies…the download is free.

It reminds me back in the day when we used SQL Mail, going back to 1998 and SQL 6.5 and then SQL 7. You had to install the entire Outlook client on the server, and then a smart DBA would go and randomly delete the GUI so nobody could open the client.  Just a random thought…didn’t SQL Mail really suck back then, and now? For those of you who started as a SQL practitioner beginning with SQL Server 2005 and only know database mail, you really missed a lot of fun trying to figure out why that abomination never worked correctly. Oh well.

Anyway, get the PIARedist.exe, and start doing some slammin’ Office Automation.

Happy Primary Interopping!
Lee Everest

 

--------------------------------



http://www.microsoft.com/download/en/details.aspx?id=3508


Posted in: SQL Server 2008/R2 , Etc. (Off-topic) , .Net  Tags:
blog comments powered by Disqus

by Lee Everest, M.S.

Info

Poll

Do you use Azure or cloud in your organization?



Show Results

Ads

Search


Month List

Calendar

«  May 2012  »
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2012 Lee Everest's SQL Server, etc. weblog