hive analyze query taking lot of time

1.4k Views Asked by Kumar At 07 March 2019 at 12:21

In order to speed up ETL queries on large tables, we run many analyze queries on these tables and date columns in the evening. but these analyze queries on columns take lot of memory and time. we are using tez. is there any way to optimize analyze query also like some set commands.

Original Q&A

There are 1 best solutions below

leftjoin On 09 March 2019 at 08:01 BEST ANSWER

If you are loading tables using insert overwrite then statistics can be gathered automatically by setting hive.stats.autogather=true during insert overwrite queries.

If the table is partitioned and partitions are being loaded incrementally, then you can analyze only last partitions.

ANALYZE TABLE [db_name.]tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]

See examples here: https://cwiki.apache.org/confluence/display/Hive/StatsDev

For ORC files it's possible to specify hive.stats.gather.num.threads to incraase parallelism.

See full list of statistic settings here: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics

hive analyze query taking lot of time

There are 1 best solutions below

Related Questions in PERFORMANCE

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in QUERY-TUNING

Related Questions in APACHE-TEZ

Trending Questions

Popular # Hahtags

Popular Questions