How to strip padding (whitespace) from text file when writing dataframe column to file

80 Views Asked by At

I have a dataframe that I am outputting to a text file (generates database insert statements). I want to align the column to the left and strip all whitespace from the left and right. I found someone who wrote a function that does this - however, it appears that the resulting outputted file contains a load of padding, whereby the column is padded to the maximum column length:

directory = "/dbfs/mnt/testdata/troubleshooting"
filename = "test.txt"

# function to align the columns in the DataFrame to the left
def align_left(s):
    s = s.str.strip()
    return s.str.ljust(s.str.len().max())

data = [[",1,'This is a bit of data in this row'"]
,[",2,'This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row.'"]
,[",3,NULL"]
,[",4,'Little bit of data'"]
,[",5,'Tiny bit of data'"]
]
schema = ["TextData"]
df = spark.createDataFrame(data, schema)
df_table = df.toPandas()

# use function to align column to left and strip of whitespace
sql_values = df_table.apply(align_left).to_string(header=False, index=False, justify='left')

# remove the characters up to and including the first comma ',' of the string
index = sql_values.find(',')
sql_values = sql_values[index+1:]

sql_values = f"INSERT INTO dbo.table (Id, TextData) \n VALUES({sql_values})"

save_path = os.path.join(directory, filename)

# write contents to file
with open(save_path, "w") as f:
    f.write(sql_values)

Resulting file:

enter image description here

The additional spaces more than double the file size. How can I get rid of them? This is what I want the file to look like (all extra spaces removed):

enter image description here

0

There are 0 best solutions below