Invalid method name 'get_database' issue with Spark 3.x and Hive 3.x

26 Views Asked by At

I am practicing hdfs, hive and spark. I installed Hadoop 3.3.6, Hive 3.1.3 and Spark 3.4.2, but am unable to perform any SQL in pyspark shell. The error I am getting is:

org.apache.thrift.TApplicationException: Invalid method name: 'get_database'

From Spark 3.4.2 documentation, it by default uses Hive 2.3.9 for metastore version, but it can be configured to 3.x.x. So I specified "--conf 'spark.sql.hive.metastore.version=3.1.3' --conf 'spark.sql.hive.metastore.jars=maven'" to my pyspark shell starting script, but it just gives the get_database problem.

If I don't specify the metastore configuration parameters (i.e. just run with "pyspark --conf 'spark.sql.catalogImplementation=hive' --conf 'hive.metastore.uris=thrift://master1:10000'"), Spark creates a metastore in my current local directory (it doesn't even try to connect to my Hive server). I am wonder why. I do also have hive-site.xml placed in my $SPARK_HOME/conf directory, and the following property in it:

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://master1:10000</value>
</property>

By github source code search, it seems to me get_database isn't there in Hive 3.x. I also downloaded and tried Hive 4.0 Beta version, which seems to have get_database, but the problem persists. I would not want to downgrade my Hive version to 2.x to cause compatibility issue with my Hadoop 3.x.

Thanks, James

Different version of Hive and different metastore version configuration in Spark. None of these solved the problem.

0

There are 0 best solutions below