I have parquet files partitioned by iso_week and need to read all the data as a PCollection with Apache Beam and the Python SDK.
Partitioned Parquet Files Structure
data_to_read/
├─ iso_week=2023-W40/
│ ├─ 12343435.parquet
├─ iso_week=2023-W41/
│ ├─ 1231243254.parquet
I tried to use the global pattern * as suggested in the documentation:
pipeline | "ReadData" >> beam.io.ReadFromParquet("data_to_read/*")
But I get the Error that the path doesn't contain any parquet file.
Is there a way to read partitioned parquet files in Apache Beam?