I need to extract a string from the input file and add it as a field in the record.
For example, if my file has a date in the filename, only the date needs to be extracted and added as an additional column in the record. If the file name is like xyzYYYMMDD.txt, only the YYYYMMDD should be extracted.
I was able to accomplish this. Assuming you are talking about Streamsets Datacollector. The rest will be pragmatic to parse your string to grab the specific parts of your file string in the Jython Evaluator.
Set up a Pipeline: (Directory Origin) -> (Expression Evaluator) -> (Jython Evaluator) -> (Trash)
==== Configuration:
Directory Origin:
Expression Evaluator:
Jython Evaluator : Script
Then Click Preview and click on the Jython evaluator: