Apache Spark on Docker how to delete .parquet in volume

23 Views Asked by At

I have an Airflow / Spark Architecture for ETL purpose.

The Airflow orchestrate PySpark jobs into my Spark Connect cluster.

In order to send data between Airflow tasks, I'm using a Docker Volume associated to the tmp folder.

The problem is, when I try to delete the .parquet folder to reclaim space with a PySpark job using shutil.rmtree, I do not have the authorization to do so. Even when I use Docker directly.

Is there a better way to share my datasets between my tasks. Or is there a way to delete the .parquet inside my tmp folder?

Thank you !

0

There are 0 best solutions below