Hi,
I'm looking at using checksum to check for changes in our data warehouse,
and have seen comments on the checksum not always being unique.
Just wanted to know how it is actually calculated to see if it is likely to
get duplicated checksums.
Thanks.Hello,
CHECKSUM computes a hash value, called the checksum, over its list of
arguments. The hash value is intended for use in building hash indices.
Hash algorithm nature determins that this one way algorithm shall have
different checksum if the keys are changed. However, there is a very small
chance that the checksum will not change.
The following is some information about hash algorithm
http://burtleburtle.net/bob/hash/
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
========================================
=============
Business-Critical Phone Support (BCPS) provides you with technical phone
support at no charge during critical LAN outages or "business down"
situations. This benefit is available 24 hours a day, 7 days a week to all
Microsoft technology partners in the United States and Canada.
This and other support options are available here:
BCPS:
https://partner.microsoft.com/US/te...erview/40010469
Others: https://partner.microsoft.com/US/te...upportoverview/
If you are outside the United States, please visit our International
Support page:
http://support.microsoft.com/defaul...rnational.aspx.
========================================
=============
This posting is provided "AS IS" with no warranties, and confers no rights.
| Thread-Topic: how is checksum calculated?
| thread-index: AcV8QG5hu7Ghfj5iRb2R0h9l4uvyuA==
| X-WBNR-Posting-Host: 202.3.198.20
| From: "examnotes" <Wreck@.community.nospam>
| Subject: how is checksum calculated?
| Date: Tue, 28 Jun 2005 17:21:02 -0700
| Lines: 10
| Message-ID: <1E91E7F2-9CA2-4ACA-9185-249AAC2BA96E@.microsoft.com>
| MIME-Version: 1.0
| Content-Type: text/plain;
| charset="Utf-8"
| Content-Transfer-Encoding: 7bit
| X-Newsreader: Microsoft CDO for Windows 2000
| Content-Class: urn:content-classes:message
| Importance: normal
| Priority: normal
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
| Newsgroups: microsoft.public.sqlserver.datawarehouse
| NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
| Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
| Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:1864
| X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
|
| Hi,
|
| I'm looking at using checksum to check for changes in our data warehouse,
| and have seen comments on the checksum not always being unique.
|
| Just wanted to know how it is actually calculated to see if it is likely
to
| get duplicated checksums.
|
| Thanks.
|
||||Many warehouses go that direction when they can't relay on last mod dates or
primary keys. My general advice is use it but understand that the values in
the DW could be slightly incorrect and do a full refresh of any table built
this way from time to time. I have seen checksums in SQL or otherwise not
find the correct differences. Almost every time I've found that situation
it was due to Nulls.
"Wreck" <Wreck@.community.nospam> wrote in message
news:1E91E7F2-9CA2-4ACA-9185-249AAC2BA96E@.microsoft.com...
> Hi,
> I'm looking at using checksum to check for changes in our data warehouse,
> and have seen comments on the checksum not always being unique.
> Just wanted to know how it is actually calculated to see if it is likely
> to
> get duplicated checksums.
> Thanks.
>|||Hello Wreck
I totally agree with Danny,
Check Sums are a good solution when last modify date is not available.
But the CHECKSUM() function does not take into account case change for
char values or NULLS.
The BINARY_CHECKSUM() is a better form of the Checksum function. It
will handle Character case change and NULLs.
Have a look at Books Online under: BINARY_CHECKSUM()
Also have a look at the link below for a good overview on using the
checksums in the ETL process.
Best Practices for Using DTS for Business Intelligence Solutions
(updated web version)
http://msdn.microsoft.com/library/d...ntbpwithdts.asp
See Section "Data Transformation and Cleansing Approach"
Hope this helps,
Myles Matheson
Data Warehouse Architect|||Hi Wreck,
many people recommend checksum to detect changes when no other
mechanism is available. To use checksums you must be aware there is
ALWAYS the possibility of a false negative. That is, a row changes but
the checksum is the same. If you can tolerate false negatived by all
means go ahead.
Personally, I prefer to know for sure that all changes have made it to
the DW. And in the DWs I deal with the 'solution' to 'false negatives'
of reloading everything is just not an option.....I even wrote my own
delta generation code to deal with these cases for my
clients......and it even understands nulls which are the bane of
detecting changes between tables in relational databases because where
a = b returns false if a or b are null....because null does not equal
null in set theory though clearly if a field was null yesterday and is
null today it has not changed...
Best Regards
Peter Nolan|||Thanks for your help guys.
I think I'll try the checksum option and see how it holds up. The other
option is to check each column for changes which shouldn't be too hard to
subsitute if checksum proves unsuitable.
Thanks again,
Wreck.
"Peter Nolan" wrote:
> Hi Wreck,
> many people recommend checksum to detect changes when no other
> mechanism is available. To use checksums you must be aware there is
> ALWAYS the possibility of a false negative. That is, a row changes but
> the checksum is the same. If you can tolerate false negatived by all
> means go ahead.
> Personally, I prefer to know for sure that all changes have made it to
> the DW. And in the DWs I deal with the 'solution' to 'false negatives'
> of reloading everything is just not an option.....I even wrote my own
> delta generation code to deal with these cases for my
> clients......and it even understands nulls which are the bane of
> detecting changes between tables in relational databases because where
> a = b returns false if a or b are null....because null does not equal
> null in set theory though clearly if a field was null yesterday and is
> null today it has not changed...
> Best Regards
> Peter Nolan
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment