admin posted on March 29, 2010 10:52

A tutorial on the Script Component, continued.

Introduction

The Script Component allows you to 1) connect to source data, 2) transform data in the pipeline on a row-by-row basis, or 3) build a custom data flow destination.  In part one we looked at two examples for using it as a source.  This time let’s look at the second option – transformation – and explore how this functionality works. Remember that when you drag this task to the data flow window in SSIS (again, I am using SQL Server Business Intelligence Development Studio, or BIDS) you will be prompted to choose how you want the component to function. I have two more examples here for your consideration, both of which are straightforward and easy to understand. 

I appreciate the emails and comments on the first part, by the way, on this series.  Unfortunately, this one is a bit more long-winded – apologies in advance – but hopefully you will be able to follow along and glean some useful information here as well.

Who am I? Why am I here?

The goal of using the Script Component (or Script Task for that matter) is to allow the ETL developer a way to accomplish a task when a delivered task won’t work.  This is a very important alternative that we have at our disposal; several of the popular ETL tools within the industry have dozens, even hundreds, of canned or ‘baked-in’ tasks.  Unfortunately our product isn’t that mature yet, but we do have this outlet available to us, and until such time that we get the full bevy of ETL tasks I find this option worthwhile. Personal opinion? Yes, so take it for what it’s worth.

When I think of a transformation using the Script Component (let’s call it SC from here on out) the first thing that pops into my mind is Regex. Verifying formats of names, addresses, phone numbers, zip codes, IP addresses,
et. al. is an excellent way to use the SC, and there are many examples on Google! demonstrating this.  Here, let’s try something different. I am going to use a text file as a source and then do some stuff in c#, the first using a simple method, and the second using the Extension Methods, a feature in .Net 3.5/4.0 that is very cool.  Again, both are  easy for you to follow along and for the most part uneventful.  If you will, get used to wiring-up the SC first, and from there you may begin to go into some more complicated transformations.  Let’s get started. 

Example 1 – A simple Script Component Transformation

First, save a text file and include some data similar to what I have in Figure 1. Open up BIDS, add a new project, add a Data Flow Task to the Control Flow tab, switch to the Data Flow tab, add a Flat File Source and a connection to the text file, connect to it, and then add a Script Component.  For the text file, leave the column names as they are - we’ll change them later, but I went ahead and created the data-types now. It’s personal preference, so do it how you normally edit transformations and/or what order you are comfortable with.

Double-click on the SC and you’ll get the prompt to choose the behavior on the SC. Go ahead and make sure that it’s on “Transformation” – Figure 2 - and click OK. Open up the SC, and navigate to the “Input Columns” choice on the left nav. There you can click all three columns (Column 0, 1, and 2) to add them so that they are included in the source. At this point make sure that you have all of them selected. See Figure 3.

image 

Figure 1. Example 1 data in a text file

 

image

Figure 2. The Transformation option in the SC

 

image

Figure 3. Including the input columns in the Script Component

OK, here’s where the fun part starts.  Navigate to “Inputs and Outputs” on the left nav.  I would begin here by renaming the input and output buffers to simple names as I have (INPUT and OUTPUT), and add columns to the output portion has shown. (Quick point: you can’t name the inputs and outputs the same or you’ll get an error later, so notice that I left my input columns the default but renamed the outputs). For the output, I created four columns: outKey, outValue1, outValue2, and outGreatest. Make sure to change the data types as well. I have outKey as string, or string [DT_STR], and all others unsigned integer (four-byte signed integer [DT_I4]). Note in Figure 4 that I start by passing three columns in, but end up with four in my output buffer.

image

Figure 4. Adding the inputs and outputs to the Input and Output buffers

 

The anticipation is almost unbearable!

In the Script Transformation Editor, choose “Script”, and then click on Edit Script so that you get to the Visual Studio Tools for Applications (VSTA) window. First off and before we do anything else, because we’ll be accessing features in .Net 3.5 in example 2, go ahead and go to Project >> Properties, and change the Target Framework to .Net Framework 3.5 (Figure 5).  Unless there is some pressing reason, always do this so you can get the latest .Net libraries.

image 

Figure 5.  Change to the .Net 3.5 framework

 

Now paste the following code:

public override void INPUT_ProcessInputRow(INPUTBuffer Row)
   {
       Row.outKey = Row.Column0;
       Row.outValue1 = Row.Column1;
       Row.outValue2 = Row.Column2;
       Row.outGreatest = Greatest(Row.Column1, Row.Column2);
   }

   static public int Greatest(int val1, int val2)
   {
       return (val1 > val2 ? val1 : val2);
   }

Let’s study this for a moment.  In a nutshell, a row has already been added for us and the get and set accessors as well in the BufferWrapper.cs and ComponentWrapper.cs files; all that is needed in our code is to carry out the chosen transformation and map input columns to output columns.  Sounds somewhat difficult? No, not really. In my opinion this is more straightforward than the source SC.  My Row.outputcolumn was named as such so that you can see how I get the values from the source text file (Row.Column 0 through Column2) to the output buffer columns.  If you can see this mapping, then you’re in good shape.

Our transformation here is a simple comparison method that returns the greatest of two integers, somewhat cheesy, yeah, but again we’re not going for a difficult transformation here.  Essentially, we are passing through the three columns that we started with, but we add a new one called outGreatest, a value that we get for each row after we take the two numeric values and send them to our ternary function called Greatest().

Add an OLE destination, and then rename the table to SC_Transform_1.  You will have several columns available, but go ahead and highlight and delete the original input rows, as I have in Figure 6.  Don’t worry – SSIS won’t give you that warning that you have columns that need to be removed for better performance!  Also note how I have named mine.  Finally, in Figure 7, I have connected them in the mappings to my out columns.  Finish and close up, and then run the package. Your result set should match what is in Figure 8.

image

Figure 6. Create a destination, but lose the extra columns

 

image

Figure 7.  Mapping the inputs to outputs

 

image

Figure 8.  Final output for Example 1

 

Example 2 – Another Script Component Transformation

The steps for this one are about the same as Example 1.  First up, create a new package, paste the two connections that you have from the first example, and then paste the data flow task from the control flow of the first package to the control flow of the second package. You can paste connections one at a time, but when you copy the task on the control flow, all of the underlying data flow tasks come along as well. Switch to the Data Flow tab of package 2 and notice the red x’s in these. We’ll fix these in a minute.  Rename the text file source as I have in the below Figure 9.

image

Figure 9.  Copying the contents of our first package

 

Create a new source file and add data as it looks in Figure 10. Mine has three rows, and seven columns, including one “key” and six columns of integer data, all comma-separated. Connect to your text file in Flat File connection manager that you just copied over from Example 1, and point it to the file below.

image

Figure 10.  Data for our second package

Connect the new text file to the script component, point to it, and then open up the SC. This time, forget about passing through all of the columns and lets just choose to bring in the key, the Min, Max, and Avg columns. Notice the double type for the average output. (See Figure 11).

image

Figure 11.  Creating the output columns and assigning their data types

 

Open the SC Edit Script button and add Using System.Linq; to your code; if not you won’t be able to use the Extension Methods that I mentioned previously. After you have added this, add the following code.  You’ll notice that I’ve wrapped it so it doesn’t go off my page here - c# like c or c++ doesn’t care if you have CRs in your code:

public override void INPUT_ProcessInputRow(INPUTBuffer Row)
{
    Row.outKey = Row.Column0;
    Row.outMin = (new int[] { Row.Column1, Row.Column2, Row.Column3, Row.Column4,
Row.Column5, Row.Column6 }).Min(); Row.outMax = (new int[] { Row.Column1, Row.Column2, Row.Column3, Row.Column4,
| Row.Column5, Row.Column6 }).Max(); Row.outAvg = (new double[] { Row.Column1, Row.Column2, Row.Column3, Row.Column4,
Row.Column5, Row.Column6 }).Average(); }

Create a new destination table similar to the first example, and remove the extra columns from the output buffer.

image

Figure 12. Destination table in SQL Server for Example 2

Run your package and verify the results.

image

Figure 13. Results for Example 2

 

Conclusion

The script component as a transformation is simple to use, yet very powerful. Follow these examples to get up and running, and then add your own super-stealthy transformations to your components. I chose these particular methods because both demonstrated the ease which you can add to your data flows.  In the first one, we used a greater than ternary function to grab the greatest of two numbers. In the second, one which I particularly liked, we deployed the new Extension Methods in .Net. 3.5.  This operation gave us values (min, max, avg)  across a row of data, not a straightforward operation in TSQL I might add.

Continue experimenting with the Script Component, and look for the concluding Part 3 on this blog series soon. If you have any questions or something that you would like to show me or share, feel free to ping me.

Thanks for reading,

Lee

 

----------------------------

Samir: “No, not again. I... why does it say paper jam when there is no paper jam? I swear to ***, one of these days, I just kick this piece of s*** out the window.”

surfing

 

 

 face

Lee


Posted in: SQL Server 2008 , SSIS , SQLServerPedia  Tags:
blog comments powered by Disqus

by Lee Everest, M.S.

Poll

Do you use Azure or cloud in your organization?



Show Results

Ads

Search


Month List

Calendar

«  February 2012  »
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
2728291234
567891011
View posts in large calendar

Tags

Disclaimer
The opinions, code, examples, et.al. expressed herein are my own personal opinions and do not represent my employer's view in any way, shape form, or fashion.  All code for demonstration purposes - no guarantees, either written or implied, are made.

© Copyright 2012 Lee Everest's SQL Server, etc. weblog