blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

SSIS: Data Profiling Task: Part 10 - Parsing the output

Back in February and March I wrote a series of blog entries explaining how to use the various profile requests in the new Data Profiling Task that is coming in SSIS 2008. All of those blog entries explained how to profile some data and analyse it offline using the Data Profile Viewer which is all well and good but quite often you need to use the results of the data profile later in the package, perhaps in a conditional precedence constraint. That is what this blog entry is intended to address.

Fundamentally you need to to extract some information from the data profiling output and given that that output is an XML Document we are going to require the use of the XML Task. Here is the profile output after running the Column Null Ratio profile request.

 

<?xml version="1.0"?>

<DataProfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">

  <DataSources>

    <DtsDataSource ID="LocalHost.AdventureWorks" Name="LocalHost.AdventureWorks">

      <DtsConnectionManagerID>LocalHost.AdventureWorks</DtsConnectionManagerID>

    </DtsDataSource>

    <DtsDataSource ID="LocalHost.AdventureWorks1" Name="LocalHost.AdventureWorks1">

      <DtsConnectionManagerID>LocalHost.AdventureWorks1</DtsConnectionManagerID>

    </DtsDataSource>

    <DtsDataSource ID="profile_output.xml" Name="profile_output.xml">

      <DtsConnectionManagerID>profile_output.xml</DtsConnectionManagerID>

    </DtsDataSource>

  </DataSources>

  <DataProfileInput>

    <ProfileMode>Exact</ProfileMode>

    <Timeout>0</Timeout>

    <Requests>

      <ColumnNullRatioProfileRequest ID="NullRatioReq">

        <DataSourceID>LocalHost.AdventureWorks1</DataSourceID>

        <Table Schema="Person" Table="Contact" />

        <Column IsWildCard="false" ColumnName="MiddleName" />

      </ColumnNullRatioProfileRequest>

    </Requests>

  </DataProfileInput>

  <DataProfileOutput>

    <Profiles>

      <ColumnNullRatioProfile ProfileRequestID="NullRatioReq" IsExact="true">

        <DataSourceID>LocalHost.AdventureWorks1</DataSourceID>

        <Table DataSource="." Database="AdventureWorks" Schema="Person" Table="Contact" RowCount="19972" />

        <Column Name="MiddleName" SqlDbType="NVarChar" MaxLength="50" Precision="255" Scale="255" LCID="1033" CodePage="0" IsNullable="true" StringCompareOptions="0" />

        <NullCount>8499</NullCount>

      </ColumnNullRatioProfile>

    </Profiles>

  </DataProfileOutput>

</DataProfile>

The important information here is this bit:

<NullCount>8499</NullCount>

That is the value that we need to extract out of the package and store in a variable within our package so that we can use it in an expression. So, how do we do it? As I said earlier, we need to use the XML Task which can use an XPath expression to extract the data that we require. It would be nice if the following XPath expression worked:

/DataProfile/DataProfileOutput/Profiles[1]/ColumnNullRatioProfile/NullCount

But unfortunately it does not. The reason is that the XML above contains a namespace (http://schemas.microsoft.com/sqlserver/2008/DataDebugger/) that cannot be declared in the XML Task - this is a limitation of the XML Task. The workaround is to edit the XPath expression like so:

/*[local-name()='DataProfile']/*[local-name()='DataProfileOutput']/*[local-name()='Profiles' and position()=1]/*[local-name()='ColumnNullRatioProfile']/*[local-name()='NullCount']

Here we use the local-name() XSD extension function to return the non-qualified element that we are interested in. Take a moment to look at our new XPath expression and notice the similarities and differences between it and our original XPath expression.  [Thanks to Matt Masson and David Noor from the SSIS product team for helping me out with this bit]

 

So now we have the following tasks:

image

More importantly, let's try and understand how we have configured the XML Task:

image

 

That's really all there is to it. It is important to point out the pertinent property settings:

  • OperationType = 'XPATH'
  • SaveOperationResult = 'TRUE'
  • SecondOperandtype = 'Direct input'
  • SecondOperand = <XPath expression from above>
  • PutResultInOneNode = 'False'
  • XPathExpression = 'Values'

Also notice that we are storing the result in a variable called @[User::Result]

Here's a very short video of my running this (I have used a script task to output the parsed value):

 

Pretty easy really. if you have any questions let me know in the comments below.

 

This concludes my brief summary of parsing the output of the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

-Jamie

Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

Published 16 April 2008 07:42 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dhileep said:

THanks for the post... i was trying for a day to do this and finally finished with your inputs.. thanks a lot

April 16, 2008 10:15
 

SSIS Junkie : SSIS: Data Profiling Task: Part 1 - Introduction said:

April 16, 2008 16:25
 

sql if said:

April 19, 2008 11:09
 

Frans van Bree said:

Jamie,

Could you, alternatively, parse the output XML string with a script task and replace the xmlns with an empty string and feed the result to the XML task? That way your original XPath query should work.

Or am I missing something?

(Not the most elegant way perhaps. For example, what if there might be multiple namespaces used to the profile task in the future that use the same attributes. Might be a risk.)

April 19, 2008 13:16
 

jamie.thomson said:

Hi Franz,

Yes, you could definitely do that. Or if you're going to use a script task you might as well parse it out in there using the .Net XML Namespace Manager.

There's always options.

-Jamie

April 19, 2008 17:23

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems