In another issue here I was having difficulty reading all the files, someone managed to help me. I managed to read all the files, but can you guys give me some more help?
I'm reading 12 different files, and a new file is only inserted once a year.
In these 12 files, each one refers to a year, and I wanted to insert a column with the year that file refers to.
In the case of files, the first line only has "Fiscal year: 2013", and I wanted to make this column a line for each file, but it reads them all as a single year.
I'm doing it this way:
# Extract the year from the file
first_line = spark.read.text(path_files).first()[0]
file_year = re.search(r"\d{4}", first_line).group()
You can find the logic for your requirement .
Note : Based on the same file which you have shared, I could not figure out the delimiter of the file . So , I have assumed delimiter of your input file is comma(',').