Retrieve Records Modified After Last Run DateTime: Difference between revisions

From Starfish ETL
Jump to navigation Jump to search
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
I do not reccomend using SetLastRunDate. It never seems to get all records that need to be processed. I suggest sorting the origin ascending by a date modified field and logging each row's date modified field. In this way, you will record each date modified that processed and be able to restart there as necessary. See: https://wiki.starfishetl.com/index.php/Using_Starfish_Scripting_Class_Properties_%26_Methods_in_C#Processing_only_records_modified_after_the_last_record_processed_based_on_date_modified
To only run data modified after the last time Starfish was run, we need to do two things:
To only run data modified after the last time Starfish was run, we need to do two things:
#Set the last time Starfish ran.
#Set the last time Starfish ran.
Line 4: Line 6:


=Set Last Run DateTime=
=Set Last Run DateTime=
Create an After Operation Procedure and run the built in Starfish Function SetLastRunDate.  Set Exec When to "Once After Conn".
Create an After Operation Procedure and run the built in Starfish Function SetLastRunDate.  Set Exec When to "Once After Conn". It is necessary to add this, otherwise the last successful run date will not get set.
<source lang="vb">
<source lang="vb">
Sub VBScriptProcedure
Sub VBScriptProcedure
If PreviewMode <> True Then
SetLastRunDate
SetLastRunDate
End If
End Sub
End Sub
</source>
</source>


=Filter Origin Based On Last Run DateTime=
=Filter Origin Based On Last Run DateTime=
First you must retrieve the DateTime your job was last run.  This can be inserted directly into the Origin Filter by using the @@VAR:LastRunDate@@ variable.  In many applications, DateTimes are converted to GMT in database, so you must convert your last run DateTime to GMT before inserting it into your Origin Filter.  To do this, Run a Before Operation set to Exec When "Once Before Conn".  You could create any number of functions, but this is the one I've used:
First you must retrieve the DateTime your job was last run.  This can be inserted directly into the Origin Filter by using the @@VAR:LastRunDate@@ variable.  An example of that in a SQL origin would look something like this:
<source lang="sql">
SELECT * FROM Contacts WHERE ModifyDate >= '@@VAR:LastRunDate@@'
</source>
 
In many applications, DateTimes are converted to GMT in database, so you must convert your last run DateTime to GMT before inserting it into your Origin Filter.  To do this, Run a Before Operation set to Exec When "Once Before Conn".  You could create any number of functions, but this is the one I've used:
<source lang="vb">
<source lang="vb">
Function ScriptedVariable
Function ScriptedVariable
Line 20: Line 25:
gmtLastRun = CDate("@@VAR:LastRunDate@@")
gmtLastRun = CDate("@@VAR:LastRunDate@@")
gmtLastRun = DateAdd("h", 7, gmtLastRun)
gmtLastRun = DateAdd("h", 7, gmtLastRun)
'LogMessage "@@VAR:LastRunDate@@"
'Sometimes, the Starfish server and the Source Server clocks are not quite in sync, so I subtract 1 minute from my last run datetime to guarantee I don't miss any records.
ScriptedVariable=iso_date(gmtLastRun)
gmtLastRun = DateAdd("n", -1, gmtLastRun)
ScriptedVariable=FormatDate(gmtLastRun, "yyyy-MM-dd HH:mm:ss")
End Function
End Function
</source>
The above function uses a VBScript Global function, iso_date.  Add this to your map with the VBScript Global button on the Mapping Tab.  You will want to format your date based on whatever origin system you are attempting to filter.
<source lang="vb">
function iso_date(byval dt)
dim y: y = year(dt)
dim m: m=month(dt)
dim d: d=day(dt)
dim h: h=hour(dt)
dim n: n=minute(dt)
dim s: s=second(dt)
  if m < 10 then m="0" & m
  if d < 10 then d="0" & d
  if h < 10 then h="0" & h
  if n < 10 then n="0" & n
  if s < 10 then s="0" & s
 
  iso_date = y & "-" & m & "-" & d & " " & h & ":" & n & ":" & s
end function
</source>
</source>


Then insert the resulting date into your Origin Filter:
Then insert the resulting date into your Origin Filter:
<source lang="vb">
<source lang="vb">
cast(integration_modify_date_c as datetime) > '@@VAR:GMTLastRunDate@@'
cast(integration_modify_date_c as datetime) >= '@@VAR:GMTLastRunDate@@'
</source>
</source>
or
or
<source lang="vb">
<source lang="vb">
[{"date_modified":{"$gte":"@@VAR:GMTLastRunDate@@"}}]
[{"date_modified":{"$gte":"@@VAR:GMTLastRunDate@@"}}]
</source>
==Use a compound key to make it more robust==
<source lang="sql">
SELECT * FROM accounts_export WHERE
/*Here I use date_modified and the id to built a compound key. 
I use this compound key to get all records where the date_modified is equal to the last run date_modified and where the ID is greater than the last ID run.
This is necessary because my origin can have 1000s of records modified at the exact same time. 
Starfish would sometimes error out when running these records and would then start over from the beginning.
It got to the point where it would never finish the records and was stuck in a loop.
I wanted to use date_modified = '@@VAR:lastRunDT@@' but I couldn't get the equal to work and >= did work.
This will obviously only work if you can sort by ID, ID is unique and ID never changes.*/
(date_modified >= '@@VAR:lastRunDT@@' AND id > @@VAR:last_id@@)
OR
/*Here I'm adding 1 second to my last run date_modified and getting all records greater than that new datetime value.
Some datetime stamps use milliseconds, if yours does, you would want to add the smallest amount of time possible.
In this way, I'm making sure I get all records AFTER my last run date_modified.
I have to use the OR statement in this way otherwise I would only get records where the ID is > than my last run ID.*/
(date_modified > DATEADD(ss,1,'@@VAR:lastRunDT@@'))
/*This is important so my compound ID runs over the records in the correct order.*/
order by date_modified, id asc
</source>
</source>

Latest revision as of 18:09, 7 June 2023

I do not reccomend using SetLastRunDate. It never seems to get all records that need to be processed. I suggest sorting the origin ascending by a date modified field and logging each row's date modified field. In this way, you will record each date modified that processed and be able to restart there as necessary. See: https://wiki.starfishetl.com/index.php/Using_Starfish_Scripting_Class_Properties_%26_Methods_in_C#Processing_only_records_modified_after_the_last_record_processed_based_on_date_modified

To only run data modified after the last time Starfish was run, we need to do two things:

  1. Set the last time Starfish ran.
  2. Filter the origin based on the last time Starfish ran.

Set Last Run DateTime

Create an After Operation Procedure and run the built in Starfish Function SetLastRunDate. Set Exec When to "Once After Conn". It is necessary to add this, otherwise the last successful run date will not get set.

Sub VBScriptProcedure
	SetLastRunDate
End Sub

Filter Origin Based On Last Run DateTime

First you must retrieve the DateTime your job was last run. This can be inserted directly into the Origin Filter by using the @@VAR:LastRunDate@@ variable. An example of that in a SQL origin would look something like this:

SELECT * FROM Contacts WHERE ModifyDate >= '@@VAR:LastRunDate@@'

In many applications, DateTimes are converted to GMT in database, so you must convert your last run DateTime to GMT before inserting it into your Origin Filter. To do this, Run a Before Operation set to Exec When "Once Before Conn". You could create any number of functions, but this is the one I've used:

Function ScriptedVariable
	Dim gmtLastRun
	gmtLastRun = CDate("@@VAR:LastRunDate@@")
	gmtLastRun = DateAdd("h", 7, gmtLastRun)
	'Sometimes, the Starfish server and the Source Server clocks are not quite in sync, so I subtract 1 minute from my last run datetime to guarantee I don't miss any records.
	gmtLastRun = DateAdd("n", -1, gmtLastRun)
	ScriptedVariable=FormatDate(gmtLastRun, "yyyy-MM-dd HH:mm:ss")
End Function

Then insert the resulting date into your Origin Filter:

cast(integration_modify_date_c as datetime) >= '@@VAR:GMTLastRunDate@@'

or

[{"date_modified":{"$gte":"@@VAR:GMTLastRunDate@@"}}]

Use a compound key to make it more robust

SELECT * FROM accounts_export WHERE 
/*Here I use date_modified and the id to built a compound key.  
I use this compound key to get all records where the date_modified is equal to the last run date_modified and where the ID is greater than the last ID run.
This is necessary because my origin can have 1000s of records modified at the exact same time.  
Starfish would sometimes error out when running these records and would then start over from the beginning.
It got to the point where it would never finish the records and was stuck in a loop.
I wanted to use date_modified = '@@VAR:lastRunDT@@' but I couldn't get the equal to work and >= did work.
This will obviously only work if you can sort by ID, ID is unique and ID never changes.*/
(date_modified >= '@@VAR:lastRunDT@@' AND id > @@VAR:last_id@@)
OR
/*Here I'm adding 1 second to my last run date_modified and getting all records greater than that new datetime value.
Some datetime stamps use milliseconds, if yours does, you would want to add the smallest amount of time possible.
In this way, I'm making sure I get all records AFTER my last run date_modified.
I have to use the OR statement in this way otherwise I would only get records where the ID is > than my last run ID.*/
(date_modified > DATEADD(ss,1,'@@VAR:lastRunDT@@'))
/*This is important so my compound ID runs over the records in the correct order.*/
order by date_modified, id asc