In GCP - how to identify the no of lines in a file has more than specific delimiter count, by ignoring header & trailer - Python/Bash operator
Eg
Data
HDR|Filename
10|1000|CHN|TVL|TWD
10|1000|CHN|TVL|TWD
10|1000|CHN|TVL|TWD
10|1000|CHN|TVL|TWD
10|1000|CHN|TVL
TRL|Filename
Expected result
Should ignore HDR TRL line
Count : 1 (as the 10|1000|CHN|TVL has only 3 delimiter)
Need to know the efficient way to achieve the function in Airflow operators
@Mani Shankar.S, Based on the stack link you mentioned in the comment. Using the
gsutil catbash command we can identify the number of lines in a file that has more than a specific delimiter count, by ignoring the header & trailer .Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.
Feel free to edit this answer for additional information.