I am trying to write a dataframe to aws glue spark local /tmp but unable to find the file

277 Views Asked by At

We have a aws glue script to connect to rds and upload to s3. The file size is so huge so we are trying to write to aws glue spark local by creating a new folder like below.And then we will use this file to directly upload to s3 using upload_file by passing transfer config. df.write.options().csv("/tmp/testdir") The problem is post job completion we are able to see only the folder testdir but no files are present inside it. Does spark instantly delete the files once write completes? Not sure whether this is the right approach but as per few documents it is suggested based on the requirement.

When tried using pandas dataframe to save in the same path I am able to read the file and upload to s3. df.toPandas()to_csv("/tmp/testdir")

0

There are 0 best solutions below