blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

SSIS: Data Profiling Task: Part 8 - Candidate Key

In this blog entry I am going to outline the function of the Candidate Key profile request in the new Data Profiling Task in SQL Server 2008. BOL states about this profile request:

Reports whether a column or set of columns is a key, or an approximate key, for the selected table.

This profile also helps you identify problems in your data, such as duplicate values in a potential key column.

 

I have tested it out on the [HumanResources].[Department] table which looks like this:

image_thumb[2]

I have highlighted the [Name] column, you can see that all the values in there are unique. Also highlighted are some values in [GroupName] which, as you can see, are duplicates.

In the example immediately below I have configured the task to check the [Name] column for uniqueness:

image

Here are the results:

image_thumb[4]

As expected key strength is 100% because all values in [Name] are unique.

 

OK, now let's test [GroupName] for uniqueness. Here's how I configured the task:

image

Important
Notice how I have set the ThresholdSetting to 'None', this ensures that I'll get some results back regardless of what the key strength is. By default a threshold is specified at 0.95 which means there's a very good chance that you won't get any data returned because the key strength doesn't meet the threshold. This completely foxed me until I was put on the right track by Catherine Chang from the SSIS team (thanks Catherine).
This is a really important point to make because this held me up for the best part of day until the behaviour was explained to me. Don't make the same mistake that I did.

OK here are the results:

image

Notice the key strength? Its 37.5%. Not surprising really, there are 16 rows in this table and only 6 distinct values. Hence [GroupName] makes for a poor candidate key. I like the fact that you get a list of all the values that are not distinct which, in the case of [GroupName], is all of them.

One more final point to make which is mentioned at the top of this blog entry but I want to re-emphasize. This profile request can be run on a set of columns, not just a single column.

 

This concludes my brief summary of the Candidate Key profile request in the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

-Jamie

 

Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Value Inclusion profile request in the new

March 7, 2008 05:23
 

SSIS Junkie : SSIS: Data Profiling Task: Part 1 - Introduction said:

March 7, 2008 05:25
 

Pedro Perfeito said:

Dear Jamie,

I'm not convinced with Data Profiling... I can do all this work using the package of Kimball in KimballGroup website, and I can customize it, and and a report of all data.

Maybe I'm wrong...

Regards!

PedroCGD

March 11, 2008 12:18

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems