How to run MSCK on Hive Standalone Metastore server via thrift client

437 Views Asked by At

I'm using Hive as my meta store database and the Hive Standalone Metastore for dealing with the DDLs, via this thrift client that implements the server thrift mapping.

I want to perform an MSCK (or some other method like this) to bulk add partitions to the Hive new tables.

But afaik, this Thrift mapping file doesn't expose an msck method. Although, I see that there's something about the Msck implemented inside standalone server (I think that it should have been implemented in jira HIVE-17824). But there isn't in the HiveMetastore class (that I understood that is the mapping of the Thrift server methods).

Does anyone know whether I can run MSCK through the standalone hive server via thrift client?

1

There are 1 best solutions below

4
Gooseman On

With python I am currently using this client with success: PyHive.

And from dbeaver you can also do it (if the command must be run by some human): dbeaver.

EDIT (I did not realize that the question was about sending the command directly to hive metastore):

The interface called IMetaStoreClient (the protocol between hive client and hive metastore server) does not implement MSCK command because it does not need it. Let me explain the logic behind MSCK command:

  1. Check if table exists in hive metastore.

  2. Scan for new partitions in the physical file system where the table stores its data. See code checkMetastore.

  3. Create/Add those new partitions. See code createPartitionsInBatches. This code ends up using the method called add_partitions of the hive metastore client.

    See add_partitions. In this point and not before the client application sends data to the hive metastore server.

  4. Drop partitions which are not in the file system anymore. See code dropPartitionsInBatches which ends up using the method called dropPartitions of the hive metastore client.

    See dropPartitions. Again, it is in this point and not before where the client application sends data to the hive metastore server.

MSCK is not really a hive metastore command. It requires logic implemented by the client running that MSCK command. In your case, you should add that logic to the client that you want to use.

For example, Spark, already implements that logic when using MSCK.