Using Row Hashing for Incremental Integrations: Difference between revisions

From Starfish ETL
Jump to navigation Jump to search
No edit summary
No edit summary
Line 8: Line 8:
2. Initialize an Xref list for ''both '''read''' and '''write''''', in our example "contact-hash". See [[Cross-reference (Xref)]].
2. Initialize an Xref list for ''both '''read''' and '''write''''', in our example "contact-hash". See [[Cross-reference (Xref)]].


3. Perform the comparison. Create a new Exec Before VBScript Procedure, call it "Compare Hash". This code will compare the origin's row hash to that stored in the xref. If it matches, the row is skipped (the data hasn't changed). If it does not match the record has been changed. If there is no match, this is a new record. In this case we store the new hash and allow the job execution to continue - you should set up an Update/Insert stage to process the record. In this example, ContactID is the primary key field from your origin table.
3. Perform the comparison. Create a new Exec Before VBScript Procedure, call it "Compare Hash". Ensure to check the append checkbox on the write side. This code will compare the origin's row hash to that stored in the xref. If it matches, the row is skipped (the data hasn't changed). If it does not match the record has been changed. If there is no match, this is a new record. In this case we store the new hash and allow the job execution to continue - you should set up an Update/Insert stage to process the record. In this example, ContactID is the primary key field from your origin table.
<source lang="vb">
<source lang="vb">
Sub VBScriptProcedure
Sub VBScriptProcedure

Revision as of 13:32, 30 October 2015

Typically when setting up an ongoing integration, we only want to pull data from your origin which has changed since the last time Starfish ran. This is accomplished using LastRunDate and applying it as a filter against your origin query. See Retrieve Records Modified After Last Run DateTime. This reduces the set of data returned and greatly speeds up syncronizations.

However if your source system does not contain a "Last Modified Date" field this becomes tricky. One option we have in Starfish ETL is store and compare a hash of the record. This still involves reading all data from the origin, but only passing records to the destination that have been changed. Use the following steps to set this up.


1. Enable UseDatabaseXref - See Engine Web.Config Settings.

2. Initialize an Xref list for both read and write, in our example "contact-hash". See Cross-reference (Xref).

3. Perform the comparison. Create a new Exec Before VBScript Procedure, call it "Compare Hash". Ensure to check the append checkbox on the write side. This code will compare the origin's row hash to that stored in the xref. If it matches, the row is skipped (the data hasn't changed). If it does not match the record has been changed. If there is no match, this is a new record. In this case we store the new hash and allow the job execution to continue - you should set up an Update/Insert stage to process the record. In this example, ContactID is the primary key field from your origin table.

Sub VBScriptProcedure
	If "@@ORG:#RowHashCode@@" = XrefRead("contact-hash","@@ORG:ContactID@@") Then
		Starfish.GotoNextRow	
	Else
		XrefWrite "contact-hash", "@@ORG:ContactID@@", "@@ORG:#RowHashCode@@"
	End If
End Sub

Please note the first time you run the job, all records will be considered new because no hashes will exist in the Xref. Subsequent runs will then use the stored hashes.