able to run triftserver from spark but unable to read my delta tables using sql queries

37 Views Asked by At

So i was able to spin up a master node on my local machine and register thriftserver to it I can see it on the spark UI. I am trying to connect to it using beeline at port 10000 some reason, I am getting this error Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Can't overwrite cause with java.lang.ClassNotFoundException: org.apache.spark.sql.delta.catalog.DeltaCatalog

I tried using pyhive where i used to following code and getting an error

from pyhive import hive

connection = hive.connect(host='localhost') query = """ SELECT * FROM delta.`s3a://test/dataset/sub_path/delta_table` """ cursor = connection.cursor() cursor.execute(query)

Error:

pyhive.exc.OperationalError: TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error running query: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found:37:36', 'org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$:runningQueryError:HiveThriftServerErrors.scala:44', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute:SparkExecuteStatementOperation.scala:325', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:runInternal:SparkExecuteStatementOperation.scala:216', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:277', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:org$apache$spark$sql$hive$thriftserver$SparkOperation$$super$run:SparkExecuteStatementOperation.scala:43', 'org.apache.spark.sql.hive.thriftserver.SparkOperation:$anonfun$run$1:SparkOperation.scala:45', 'scala.runtime.java8.JFunction0$mcV$sp:apply:JFunction0$mcV$sp.java:23', 'org.apache.spark.sql.hive.thriftserver.SparkOperation:withLocalProperties:SparkOperation.scala:79', 'org.apache.spark.sql.hive.thriftserver.SparkOperation:withLocalProperties$:SparkOperation.scala:63', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:withLocalProperties:SparkExecuteStatementOperation.scala:43', 'org.apache.spark.sql.hive.thriftserver.SparkOperation:run:SparkOperation.scala:45', 'org.apache.spark.sql.hive.thriftserver.SparkOperation:run$:SparkOperation.scala:43', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:run:SparkExecuteStatementOperation.scala:43', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:484', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:460', 'sun.reflect.NativeMethodAccessorImpl:invoke0:NativeMethodAccessorImpl.java:-2', 'sun.reflect.NativeMethodAccessorImpl:invoke:NativeMethodAccessorImpl.java:62', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:71', 'org.apache.hive.service.cli.session.HiveSessionProxy:lambda$invoke$0:HiveSessionProxy.java:58', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1878', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:58', 'com.sun.proxy.$Proxy40:executeStatement::-1', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:280', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:456', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1557', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1542', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:38', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:52', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:310', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:750', '*java.lang.RuntimeException:java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found:124:88', 'org.apache.hadoop.conf.Configuration:getClass:Configuration.java:2688', 'org.apache.hadoop.fs.FileSystem:getFileSystemClass:FileSystem.java:3431', 'org.apache.hadoop.fs.FileSystem:createFileSystem:FileSystem.java:3466', 'org.apache.hadoop.fs.FileSystem:access$300:FileSystem.java:174', 'org.apache.hadoop.fs.FileSystem$Cache:getInternal:FileSystem.java:3574', 'org.apache.hadoop.fs.FileSystem$Cache:get:FileSystem.java:3521', 'org.apache.hadoop.fs.FileSystem:get:FileSystem.java:540', 'org.apache.hadoop.fs.Path:getFileSystem:Path.java:365', 'org.apache.spark.sql.delta.DeltaTableUtils$:findDeltaTableRoot:DeltaTable.scala:180', 'org.apache.spark.sql.delta.sources.DeltaDataSource$:parsePathIdentifier:DeltaDataSource.scala:314', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:x$1$lzycompute:DeltaTableV2.scala:70', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:x$1:DeltaTableV2.scala:65', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:timeTravelByPath$lzycompute:DeltaTableV2.scala:65', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:timeTravelByPath:DeltaTableV2.scala:65', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:$anonfun$timeTravelSpec$1:DeltaTableV2.scala:98', 'scala.Option:orElse:Option.scala:447', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:timeTravelSpec$lzycompute:DeltaTableV2.scala:98', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:timeTravelSpec:DeltaTableV2.scala:94', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:snapshot$lzycompute:DeltaTableV2.scala:102', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:snapshot:DeltaTableV2.scala:101', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:tableSchema$lzycompute:DeltaTableV2.scala:119', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:tableSchema:DeltaTableV2.scala:117', 'org.apache.spark.sql.delta.catalog.DeltaTableV2:schema:DeltaTableV2.scala:121', 'org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$:create:DataSourceV2Relation.scala:178', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:$anonfun$createRelation$1:Analyzer.scala:1180', 'scala.Option:map:Option.scala:230', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:createRelation:Analyzer.scala:1152', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:$anonfun$lookupRelation$3:Analyzer.scala:1203', 'scala.Option:orElse:Option.scala:447', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:$anonfun$lookupRelation$1:Analyzer.scala:1201', 'scala.Option:orElse:Option.scala:447', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation:Analyzer.scala:1193', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13:applyOrElse:Analyzer.scala:1064', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13:applyOrElse:Analyzer.scala:1028', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:$anonfun$resolveOperatorsUpWithPruning$3:AnalysisHelper.scala:138', 'org.apache.spark.sql.catalyst.trees.CurrentOrigin$:withOrigin:TreeNode.scala:176', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:$anonfun$resolveOperatorsUpWithPruning$1:AnalysisHelper.scala:138', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$:allowInvokingTransformsInAnalyzer:AnalysisHelper.scala:323', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:resolveOperatorsUpWithPruning:AnalysisHelper.scala:134', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:resolveOperatorsUpWithPruning$:AnalysisHelper.scala:130', 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan:resolveOperatorsUpWithPruning:LogicalPlan.scala:30', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:$anonfun$resolveOperatorsUpWithPruning$2:AnalysisHelper.scala:135', 'org.apache.spark.sql.catalyst.trees.UnaryLike:mapChildren:TreeNode.scala:1228', 'org.apache.spark.sql.catalyst.trees.UnaryLike:mapChildren$:TreeNode.scala:1227', 'org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode:mapChildren:LogicalPlan.scala:208', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:$anonfun$resolveOperatorsUpWithPruning$1:AnalysisHelper.scala:135', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$:allowInvokingTransformsInAnalyzer:AnalysisHelper.scala:323', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:resolveOperatorsUpWithPruning:AnalysisHelper.scala:134', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper:resolveOperatorsUpWithPruning$:AnalysisHelper.scala:130', 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan:resolveOperatorsUpWithPruning:LogicalPlan.scala:30', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:apply:Analyzer.scala:1028', 'org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$:apply:Analyzer.scala:987', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:$anonfun$execute$2:RuleExecutor.scala:211', 'scala.collection.LinearSeqOptimized:foldLeft:LinearSeqOptimized.scala:126', 'scala.collection.LinearSeqOptimized:foldLeft$:LinearSeqOptimized.scala:122', 'scala.collection.immutable.List:foldLeft:List.scala:91', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:$anonfun$execute$1:RuleExecutor.scala:208', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:$anonfun$execute$1$adapted:RuleExecutor.scala:200', 'scala.collection.immutable.List:foreach:List.scala:431', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:execute:RuleExecutor.scala:200', 'org.apache.spark.sql.catalyst.analysis.Analyzer:org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext:Analyzer.scala:231', 'org.apache.spark.sql.catalyst.analysis.Analyzer:$anonfun$execute$1:Analyzer.scala:227', 'org.apache.spark.sql.catalyst.analysis.AnalysisContext$:withNewAnalysisContext:Analyzer.scala:173', 'org.apache.spark.sql.catalyst.analysis.Analyzer:execute:Analyzer.scala:227', 'org.apache.spark.sql.catalyst.analysis.Analyzer:execute:Analyzer.scala:188', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:$anonfun$executeAndTrack$1:RuleExecutor.scala:179', 'org.apache.spark.sql.catalyst.QueryPlanningTracker$:withTracker:QueryPlanningTracker.scala:88', 'org.apache.spark.sql.catalyst.rules.RuleExecutor:executeAndTrack:RuleExecutor.scala:179', 'org.apache.spark.sql.catalyst.analysis.Analyzer:$anonfun$executeAndCheck$1:Analyzer.scala:212', 'org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$:markInAnalyzer:AnalysisHelper.scala:330', 'org.apache.spark.sql.catalyst.analysis.Analyzer:executeAndCheck:Analyzer.scala:211', 'org.apache.spark.sql.execution.QueryExecution:$anonfun$analyzed$1:QueryExecution.scala:76', 'org.apache.spark.sql.catalyst.QueryPlanningTracker:measurePhase:QueryPlanningTracker.scala:111', 'org.apache.spark.sql.execution.QueryExecution:$anonfun$executePhase$2:QueryExecution.scala:185', 'org.apache.spark.sql.execution.QueryExecution$:withInternalError:QueryExecution.scala:510', 'org.apache.spark.sql.execution.QueryExecution:$anonfun$executePhase$1:QueryExecution.scala:185', 'org.apache.spark.sql.SparkSession:withActive:SparkSession.scala:779', 'org.apache.spark.sql.execution.QueryExecution:executePhase:QueryExecution.scala:184', 'org.apache.spark.sql.execution.QueryExecution:analyzed$lzycompute:QueryExecution.scala:76', 'org.apache.spark.sql.execution.QueryExecution:analyzed:QueryExecution.scala:74', 'org.apache.spark.sql.execution.QueryExecution:assertAnalyzed:QueryExecution.scala:66', 'org.apache.spark.sql.Dataset$:$anonfun$ofRows$2:Dataset.scala:99', 'org.apache.spark.sql.SparkSession:withActive:SparkSession.scala:779', 'org.apache.spark.sql.Dataset$:ofRows:Dataset.scala:97', 'org.apache.spark.sql.SparkSession:$anonfun$sql$1:SparkSession.scala:622', 'org.apache.spark.sql.SparkSession:withActive:SparkSession.scala:779', 'org.apache.spark.sql.SparkSession:sql:SparkSession.scala:617', 'org.apache.spark.sql.SQLContext:sql:SQLContext.scala:651', 'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute:SparkExecuteStatementOperation.scala:291', '*java.lang.ClassNotFoundException:Class org.apache.hadoop.fs.s3a.S3AFileSystem not found:125:1', 'org.apache.hadoop.conf.Configuration:getClassByName:Configuration.java:2592', 'org.apache.hadoop.conf.Configuration:getClass:Configuration.java:2686'], sqlState=None, errorCode=0, errorMessage='Error running query: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found'), operationHandle=None)

Config used to start thrift-server sbin/start-thriftserver.sh --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog --jars aws-java-sdk-1.11.901.jar, aws-java-sdk-bundle-1.11.874.jar,hadoop-aws-3.2.3.jar --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.fast.upload=true --conf spark.hadoop.fs.s3a.connection.ssl.enabled=true --conf spark.hadoop.com.amazonaws.services.s3.enableV2=true --conf spark.hadoop.fs.s3a.committer.magic.enabled=true --conf spark.hadoop.fs.s3a.committer.name=magic --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --conf spark.hadoop.fs.s3a.path.style.access=true --conf spark.hadoop.fs.s3a.endpoint=http://localhost:9000 --conf spark.hadoop.fs.s3a.access.key=access --conf spark.hadoop.fs.s3a.secret.key=secret --packages 'io.delta:delta-core_2.12:2.1.0' --master spark://localhost:7077

I was expecting to be able to list columns from the delta table

0

There are 0 best solutions below