my script failing due to a heap space issue to process too many partitions. To avoid the issue I am trying to insert all the partitions into a single partition but I am facing the below error
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''2021-01-16'': Table insclause-0 has 78 columns, but query has 79 columns.
set hive.exec.dynamic.partition=true;
set mapreduce.reduce.memory.mb=6144;
set mapreduce.reduce.java.opts=-Xmx5g;
set hive.exec.dynamic.partition=true;
insert overwrite table db_temp.travel_history_denorm partition (start_date='2021-01-16')
select * from db_temp.travel_history_denorm_temp_bq
distribute by start_date;```
Can someone please suggest what is the issue, I checked the schema for the tables it is the same. ?
You are inserting into static partition (partition value specified in the target table partition clause), in this case you should not have partition column in the select. And select * returns partition column (the last one), this is why query fails, it should be no partition column:
Static partition insert:
Dynamic partition:
Adding
distribute bytriggers additional reduce step, all records are being grouped according todistribute byand each reducer receives single partition. This can help to solve the OOM problem when you are loading many dynamic partitions in each reducer. Without distribute by each reducer will create files in each partitions, keeping too many buffers simultaneously.In addition to
distribute byyou can set the maximum bytes per reducer. This setting will limit the amount of data processed by single reducer and also may help with OOM:If this figure is too small, it will trigger too many reducers, if too big - then each reducer will process too much data. Adjust accordingly.
Also try this setting for dynamic partition load:
You can combine all these methods: distribute by partition key, bytes.per.reducer and sort.dynamic.partition for dynamic partition loading.
Also exception message can help to understand where exactly the OOM happens and fix accordingly.