Spark Events : Reading Hive table created thru Hive cli vs Hive table created thru Spark

70 Views Asked by At

While working on Spark Event listener, am bit confused with the way Spark is behaving.

Scenario 1: Hive table created using Spark

Suppose if EMPLOYEE table is created using Spark api (saveAsTable) and when we read this table through Spark (either api/sql) the LogicalPlan generated has reference to catalog object that provides clear details of Hive database and Hive table name as below

CatalogTable(
  Database: company
  Table: EMPLOYEE
  Created Time: Mon Jun 01 02:59:11 GMT 2022
  Last Access: UNKNOWN
  Created By: Spark
  Type: MANAGED
  Provider: orc
)

Scenario 2: Hive table created using Hive cli

Suppose if the same EMPLOYEE table is created using Hive cli and when we read back this table through spark (either api/sql) the LogicalPlan generated has reference to HDFS path but not Catalog table object though this table is available in Hive catalog and is accessible through hive cli as like earlier EMPLOYEE(created thru Spark) or other tables.

Following is the spark code used to write EMPLOYEE table just in case if it helps.

SparkSession spark = SparkSession
            .builder()
            .appName("HiveZipCodePipeline")
            .enableHiveSupport()
            .getOrCreate();

    spark.sparkContext().setLogLevel("DEBUG");

    Dataset<Row> sqlDF = spark.read().table("company.emp_source");

    sqlDF.write().mode(SaveMode.Overwrite).format("csv").saveAsTable("company.employee");

    spark.close();

Why is this difference? How can we deduce that in the second scenario the source is still a Hive table?

1

There are 1 best solutions below

0
ASR On

When we create a table in Spark (managed or external) by default it is created as a Hive. CREATE TABLE <> USING HIVE So Spark is already aware of the hive catalog info. When we create hive hive-managed table and read it from Spark, Spark loads info from metastore and not all properties are copied from the hive catalog to the Spark catalog object.