Using Row Hashing for Incremental Integrations: Difference between revisions

From Starfish ETL
Jump to navigation Jump to search
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Hashing The Whole Record=
Typically when setting up an ongoing integration, we only want to pull data from your origin which has changed since the last time Starfish ran. This is accomplished using LastRunDate and applying it as a filter against your origin query. See [[Retrieve Records Modified After Last Run DateTime]]. This reduces the set of data returned and greatly speeds up syncronizations.
Typically when setting up an ongoing integration, we only want to pull data from your origin which has changed since the last time Starfish ran. This is accomplished using LastRunDate and applying it as a filter against your origin query. See [[Retrieve Records Modified After Last Run DateTime]]. This reduces the set of data returned and greatly speeds up syncronizations.


However if your source system does not contain a "Last Modified Date" field this becomes tricky. One option we have in Starfish ETL is store and compare a hash of the record. This still involves reading all data from the origin, but only passing records to the destination that have been changed. Use the following steps to set this up.
However if your source system does not contain a "Last Modified Date" field this becomes tricky. One option we have in Starfish ETL is store and compare a hash of the record. This still involves reading all data from the origin, but only passing records to the destination that have been changed. Use the following steps to set this up.


1. Enable UseDatabaseXref - See [[Engine Web.Config Settings]].
1. Enable UseDatabaseXref - See [[Engine Web.Config Settings]].
Line 20: Line 20:


Please note the first time you run the job, all records will be considered new because no hashes will exist in the Xref. Subsequent runs will then use the stored hashes.
Please note the first time you run the job, all records will be considered new because no hashes will exist in the Xref. Subsequent runs will then use the stored hashes.
=Hashing Partial Records=
I had an integration that pulled data from a single table into Accounts, Contacts, and Opportunities.  Each of these target tables used different fields from the source.  I didn't want to process every record as it changed, just the ones I cared about.  So, I created VBScript before and after save procedures to save the last rowhash and compare that to the new rowhash.  If they are different, then I process the record.
==VB Script==
'''Before Save'''<br>
Repeat Each Row
<source lang="vb">
Sub VBScriptProcedure
Dim UTF8
Set UTF8 = CreateObject("System.Text.UTF8Encoding")
stringToUTFBytes = UTF8.GetBytes_4("FIELDS TO BE TRACKED")
Dim MD5
set MD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
MD5.Initialize()
'Note you MUST use computehash_2 to get the correct version of this method, and the bytes MUST be double wrapped in brackets to ensure they get passed in correctly.
md5hashBytes = MD5.ComputeHash_2( (stringToUTFBytes) )
Dim BytesToBase64
With CreateObject("MSXML2.DomDocument").CreateElement("b64")
.dataType = "bin.base64"
.nodeTypedValue = md5hashBytes
BytesToBase64 = .Text
End With
'LogMessage BytesToBase64
Dim oldHash
oldHash = XrefRead("SalesQuoteToAccountHash","@@ORG:quote_id@@")
'LogMessage oldHash
If oldHash = BytesToBase64 Then
GoToNextRow
End If
End Sub
</source>
'''After Operation'''<br>
Run Each Row
<source lang="vb">
Sub VBScriptProcedure
If Not PreviewMode Then
Dim UTF8
Set UTF8 = CreateObject("System.Text.UTF8Encoding")
stringToUTFBytes = UTF8.GetBytes_4("@@ORG:quote_id@@@@ORG:customer@@@@ORG:billing_address@@")
Dim MD5
set MD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
MD5.Initialize()
'Note you MUST use computehash_2 to get the correct version of this method, and the bytes MUST be double wrapped in brackets to ensure they get passed in correctly.
md5hashBytes = MD5.ComputeHash_2( (stringToUTFBytes) )
Dim BytesToBase64
With CreateObject("MSXML2.DomDocument").CreateElement("b64")
.dataType = "bin.base64"
.nodeTypedValue = md5hashBytes
BytesToBase64 = .Text
End With
'LogMessage BytesToBase64
XrefWrite "SalesQuoteToAccountHash","@@ORG:quote_id@@",BytesToBase64
End If
End Sub
</source>

Latest revision as of 19:49, 13 July 2018

Hashing The Whole Record

Typically when setting up an ongoing integration, we only want to pull data from your origin which has changed since the last time Starfish ran. This is accomplished using LastRunDate and applying it as a filter against your origin query. See Retrieve Records Modified After Last Run DateTime. This reduces the set of data returned and greatly speeds up syncronizations.

However if your source system does not contain a "Last Modified Date" field this becomes tricky. One option we have in Starfish ETL is store and compare a hash of the record. This still involves reading all data from the origin, but only passing records to the destination that have been changed. Use the following steps to set this up.

1. Enable UseDatabaseXref - See Engine Web.Config Settings.

2. Initialize an Xref list for both read and write, in our example "contact-hash". Ensure to check the append checkbox on the write side. See Cross-reference (Xref).

3. Perform the comparison. Create a new Exec Before VBScript Procedure, call it "Compare Hash". This code will compare the origin's row hash to that stored in the xref. If it matches, the row is skipped (the data hasn't changed). If it does not match the record has been changed. If there is no match, this is a new record. In this case we store the new hash and allow the job execution to continue - you should set up an Update/Insert stage to process the record. In this example, ContactID is the primary key field from your origin table.

Sub VBScriptProcedure
	If "@@ORG:#RowHashCode@@" = XrefRead("contact-hash","@@ORG:ContactID@@") Then
		Starfish.GotoNextRow	
	Else
		XrefWrite "contact-hash", "@@ORG:ContactID@@", "@@ORG:#RowHashCode@@"
	End If
End Sub

Please note the first time you run the job, all records will be considered new because no hashes will exist in the Xref. Subsequent runs will then use the stored hashes.

Hashing Partial Records

I had an integration that pulled data from a single table into Accounts, Contacts, and Opportunities. Each of these target tables used different fields from the source. I didn't want to process every record as it changed, just the ones I cared about. So, I created VBScript before and after save procedures to save the last rowhash and compare that to the new rowhash. If they are different, then I process the record.

VB Script

Before Save
Repeat Each Row

Sub VBScriptProcedure
	Dim UTF8
	Set UTF8 = CreateObject("System.Text.UTF8Encoding")
	stringToUTFBytes = UTF8.GetBytes_4("FIELDS TO BE TRACKED")

	Dim MD5
	set MD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
	MD5.Initialize()
	'Note you MUST use computehash_2 to get the correct version of this method, and the bytes MUST be double wrapped in brackets to ensure they get passed in correctly.
	md5hashBytes = MD5.ComputeHash_2( (stringToUTFBytes) )

	Dim BytesToBase64
	With CreateObject("MSXML2.DomDocument").CreateElement("b64")
		.dataType = "bin.base64"
		.nodeTypedValue = md5hashBytes
		BytesToBase64 = .Text
	End With
	'LogMessage BytesToBase64
	
	Dim oldHash
	oldHash = XrefRead("SalesQuoteToAccountHash","@@ORG:quote_id@@")
	'LogMessage oldHash
	
	If oldHash = BytesToBase64 Then
		GoToNextRow
	End If
End Sub

After Operation
Run Each Row

Sub VBScriptProcedure
	If Not PreviewMode Then
		Dim UTF8
		Set UTF8 = CreateObject("System.Text.UTF8Encoding")
		stringToUTFBytes = UTF8.GetBytes_4("@@ORG:quote_id@@@@ORG:customer@@@@ORG:billing_address@@")

		Dim MD5
		set MD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
		MD5.Initialize()
		'Note you MUST use computehash_2 to get the correct version of this method, and the bytes MUST be double wrapped in brackets to ensure they get passed in correctly.
		md5hashBytes = MD5.ComputeHash_2( (stringToUTFBytes) )

		Dim BytesToBase64
		With CreateObject("MSXML2.DomDocument").CreateElement("b64")
			.dataType = "bin.base64"
			.nodeTypedValue = md5hashBytes
			BytesToBase64 = .Text
		End With
'LogMessage BytesToBase64
		XrefWrite "SalesQuoteToAccountHash","@@ORG:quote_id@@",BytesToBase64
	End If
End Sub