Why number of buckets in hive should be equal to number of reducers?

1.6k Views Asked by Ramprakash At 03 August 2017 at 06:42

In hive, why number of buckets should be equal to number of reducers?

There are 2 best solutions below

Cloudkollektiv On 18 September 2017 at 09:56 BEST ANSWER

Because this is the most optimized way of working for mapreduce (all else equal). Tasks will be divided among reducers.

In hive 0.x and 1.x you have to specify the following: hive.enforce.bucketing = true. This means that the number of reducers will be automatically determined based on the number of buckets in your table. In later versions of hive (2.x) this is set by default.

Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables

Archit Agarwal On 29 January 2019 at 16:14

Number of reducers launched while inserting into a bucketed table is a divisor of number of buckets in that table. The divisor, which is closest to the max reducers set, is selected and that many reducers are launched.

Example:

Num of buckets in a table 5956.
hive.exec.reducers.max=1009
divisors of 5956=1489*4
number of launched reducers: 4

so either 1489 or 4 reducers can be launched but since max reducers that can be launched are 1009, only 4 reducers will run which can take a decade to run for big sized table.

Setting hive.exec.reducers.max=2000 will launch 1489 reducers.

Why number of buckets in hive should be equal to number of reducers?

There are 2 best solutions below

Related Questions in APACHE

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in PARTITIONING

Related Questions in BUCKETS

Trending Questions

Popular # Hahtags

Popular Questions