File System
File System Origin
The File System Connector reads all files from a folder specified in the Directory field. The Username and Password fields are not used. Filter can be used to filter for certain file names/extensions. To search for all files with an extension of "txt, use the filter "*.txt". To search for all files with "Work" anywhere in the filename, use the filter "*Work*"
The Connector returns standard data in the first 12 fields returned:
- CreationTime
- FullDirectoryName
- DirectoryRelative
- DirectoryName
- Extension
- FullName
- IsReadOnly
- LastAccessTime
- LastWriteTime
- Length
- Name
- LastWriteTime
The Connector returns all information available for the File in json format in the ExtendedProperties field. Rather than add a column for each potential property, we went with a more dynamic method of using one column called “ExtenededProperties” and formatted the data as JSON.
It’s very easy to parse this and then pull out the data you need in Starfish.
First, create a Before Each Row VBScript Operation which loads the chunk of JSON into the parser:
Sub VBScriptProcedure ParseJSON("@@ORG:ExtendedProperties@@") End Sub
Then on your field, pull out the value you want by name..
Function ScriptedField ScriptedField=GetJSON("Authors") End Function
Function ScriptedField ScriptedField=GetJSON("Title") End Function
etc..
If a value doesn’t exist for a particular file, it’ll just return “”.
Reading and manipulating a File
The connector itself can not read a file, but vbScript or C# code can be used to read and manipulate files.
Reading a file:
Function ScriptedField Dim text text = ExtractText("C:\test.txt") ScriptedField = text End Function
Moving a file:
Function ScriptedField MoveFile("C:\TestDir1\test.txt","C:\TestDir2\test.txt") End Function
You can also rename the file with this method.
Processing a huge amount of files
Processing a huge amount of files can cause performance issues. We started with a folder with 40,000 tiny .txt files in it and it ran the first 10,000 or so without problem and then slowed to a crawl. The solution for me was to process a file and then move the file to a new folder. In this way, I could easily process the files in increments of 5000.