I have one MINIO Table partitioned by RUN_DATE which has one year of data. I am just giving one sample below. I want to delete last 3 days of data and reload it back from another table. I am trying below query but that is not dropping the partitions. Any help will be highly appreciated.
>>> spark.sql('''select * from dx_dl_abc_xyz.test''').show(10,False)
+------------------+------------------+-----------+----------+
|asset_name |module_name |application|run_date |
+------------------+------------------+-----------+----------+
|Campaign Reporting|Campaign Reporting|Marketing |2024-03-21|
|Orders |Orders |C&R |2024-03-18|
|CX |CX |CX&Digital |2024-03-20|
|APM |APM |C&R |2024-03-19|
+------------------+------------------+-----------+----------+
CREATE TABLE `dx_dl_abc_xyz`.`test` (
`asset_name` STRING,
`module_name` STRING,
`application` STRING,
`run_date` STRING)
USING parquet
PARTITIONED BY (run_date)
LOCATION 's3a://dx.dl.abc.xyz/abc_dashboard/test'
sqlContext.setConf("hive.exec.dynamic.partition", "true")
sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
spark.sql('''alter table dx_dl_abc_xyz.test drop if exists
partition(run_date='2024-03-21')''').show()
spark.sql('''msck repair table dx_dl_abc_xyz.test''').show()
Even after running above commands still seeing all 4 records.
Thanks, Debasis