Using Multi-threading

From Starfish ETL
Revision as of 19:29, 5 November 2014 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

A new feature in Starfish 2.1 is the ability to use multithreading. This ability has been built-in natively to the Starfish Engine, so there is no need set up any additional message queue systems. The default number of threads is set to 1 in the Engine Web.config file. To override this, set a value in the Thread Count box in the Starfish Admin Run tab, or modify the value in the Config file.

How many threads to use depends on a number of factors and finding the “sweet spot” may involve some trial and error. Multi-threading is most beneficial in scenarios where there may be network latency, such as integrations between backend systems and cloud-based or hosted solutions. In cases where data is being moved between two systems on the same database server or even within the network, using more than one thread may not produce better results (and may degrade performance). However, in ground-to-cloud migrations very significant performance gains will likely be seen.

For these cases, we recommend starting with a value of 8 threads. This number can be increased if performance allows it. Starfish does not block an upper-limit of the number of threads that can be used. You’re only limited by the memory and speed of your server and, realistically, the limitations of your network bandwidth and the server you’re communicating with. For this reason, we recommend not exceeding 24 threads.

NOTE: Multi-threading hands off the next available data row to the next available thread as it because available. This means, rows being processed can be in any state at any given time. Rows will not complete in the sequential order they are fed into the queue. So, for example if you’re using 8 threads, Row #6 may finish before Row #1, then Row #3 may be next, and so on. The results will basically be random. For this reason, you cannot use multiple threads if you require data to be handled in a sequential order. For instance, if your job performs an Insert/Update operation and relies on this lookup to work properly based on data from previous rows.