I have created a table in hive 3.1.3 as below;
Create external table test_tez_orc_zstd
(
Id bigint
)stored as orc
Tblproperties(orc.compress=zstd)
Location '...'
It is created, and then I wanted to insert one row;
Insert into test_tez_orc_zstd
Select 1
Then it throwed following error;
No enum constant org.apache.orc.CompressionKind.ZSTD
Hive is configured to use Tez.
If I do same thing for parquet compress zstd it works.
How can I handle this?
ROOT CAUSE:
Apache Hive version
3.1.3usesorcversion1.5.8, please see here.zstddecompression has been supported inorcstarting from1.6.0; https://issues.apache.org/jira/browse/ORC-363.You can see
1.5.8enum constants here and1.6.0here. So, in this case we can say that Hive3.1.3does not supportTblproperties(orc.compress=zstd).POSSIBLE SOLUTION:
In Hive,
orcversion has been upgraded to above1.6.0in release4.0.0-alpha-1here https://issues.apache.org/jira/browse/HIVE-23553.This might be challenging, but you can backport related commits on top of release tag
3.1.3, then build the project and replace the related jars in Hive's library.Please note that not only
orcdependencies are in Hive's library directly, but also they are included into some of the fat jars such ashive-exec.So, steps should be as follows;
hiveand checkout to release tag3.1.3.orcto the desired version.mvn clean package -DskipTests.greporcin hive library where you installed hive to see whichorcdependencies directly in the classpath, and which fat jars haveorcclasses.The challenging part is that
orcupgrade commits can be pretty big, and there might be conflicts.