In order to speed up ETL queries on large tables, we run many analyze queries on these tables and date columns in the evening.
but these analyze queries on columns take lot of memory and time.
we are using tez.
is there any way to optimize analyze query also like some set commands.
hive analyze query taking lot of time
1.4k Views Asked by Kumar At
1
There are 1 best solutions below
Related Questions in PERFORMANCE
- Upsert huge amount of data by EFCore.BulkExtensions
- How can I resolve this error and work smoothly in deep learning?
- Efficiently processing many small elements of a collection concurrently in Java
- Theme Preloader for speed optimization in WordPress
- I need help to understand the time wich my simple ''hello world'' is taking to execute
- Non-blocking state update
- Do conditional checks cause bottlenecks in Javascript?
- Performance of sketch drastically decreases outside of the P5 Web Editor
- sample query for review for improvement on big query
- Is there an indexing strategy in Postgres which will operate effectively for JOINs with ORs
- Performance difference between two JavaScript code snippets for comparing arrays of strings
- C++ : Is there an objective universal way to compare the speed of iterative algorithms?
- How to configure api http request with load testing
- the difference in terms of performance two types of update in opensearch
- Sveltekit : really long to send the first page and intense CPU computation
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in HIVE
- Type Adapter for Offset in hive flutter
- HIVE Sql Date conversion
- How to set spark.executor.extraClassPath & spark.driver.extraClassPath in hive query without adding those in hive-site.xml
- Hive query on HUE shows different timestamp than programatically/on data
- descending order of data in hive using collect_set
- How to optimize writing to a large table in Hive/HDFS using Spark
- Spark SQL repartition before insert operation
- Alter datatype of complex type(array<struct>>) in hive
- SqlAlchemy connection to Hive using http thrift transport and basic auth
- Aggregate values into a new column while retaining the old column
- Is it possible to query MAPR hdfs/hive tables from Trino?
- Can we make a column having both partitioning and bucketing in hive?
- converting varchar(7) to decimal (7,5) in hive
- Extract all characters before numeric values in hive SQL
- Livy session to submit pyspark from HDFS
Related Questions in QUERY-TUNING
- Need suggestion for POSTGRES Tuning
- Indexing for complex predicates
- NVL function not using index rather FTS, is it possible to modify the query
- Low cardinality index with uneven distribution of possible values
- Query rewrite and Tuning
- optimizing a really long code with a lot of JOINS on Redshift
- Possibilities of Query tuning in my case using SQL Server 2012
- Oracle tune query by using of temp table
- Error logging with MERGE query in Oracle 11
- Query Tuning and rewrite - SQL Server
- hive analyze query taking lot of time
- selecting from a view is taking longer than 30+ minutes
- need help in re-writing this query, which uses same data set multiple times, as per explain plan
- SQL Query takes a long time when filtering recent rows
- User session stuck in killed\rollback state
Related Questions in APACHE-TEZ
- SemanticException when trying to remove partition predicates: fail to find child from parent :Issues in hive,When using the tez engine
- how to fix the maven build error on tez-dag stage?
- How can I run Java MapReduce application on Apache TEZ engine?
- Fail in Building Apache Tez from Source
- Tez failed when insert any record to Hive based on Tez
- query result in TEZ and SPARK different (in spark correct)
- Hive CLI with Tez failed due to localization of Resources
- Hive with TEZ failed to start Hive CLI
- Reduce number tez mappers during runtime
- Vertex issue occuring while executing query
- Unknown frame descriptor while reading zst file on hive?
- "No enum constant org.apache.orc.CompressionKind.ZSTD" When Insert Data to ORC Compress ZSTD Table
- Hive INSERT OVERWRITE creates 1 large file in every partition
- In which scenario, disable Hadoop vectorized execution better than enabling it
- Why does the tez engine also add a reduce phase to the simplest insert statement, and how to remove it through configuration?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
If you are loading tables using insert overwrite then statistics can be gathered automatically by setting
hive.stats.autogather=trueduring insert overwrite queries.If the table is partitioned and partitions are being loaded incrementally, then you can analyze only last partitions.
See examples here: https://cwiki.apache.org/confluence/display/Hive/StatsDev
For ORC files it's possible to specify
hive.stats.gather.num.threadsto incraase parallelism.See full list of statistic settings here: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics