Spark : NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities

30 Views Asked by At

Error : " An error occurred while calling o132.load. : java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities" My Spark version is : 3.3.1 Hadoop : Hadoop 3.1.1.3.1.5.0-152

Spark command : spark-submit --master yarn --deploy-mode client --jars /spark3.3.1/jars/iceberg-spark-runtime-3.3_2.12-1.3.0.jar,/spark3.3.1/jars/hadoop-aws-3.1.1.jar,/app/spark3.3.1/jars/aws-java-sdk-bundle-1.11.271.jar

Setting this in Spark Session : ("spark.jars.packages","org.apache.hadoop:hadoop-aws:3.1.1")

filesystem = spark._jvm.org.apache.hadoop.fs.FileSystem
path = spark._jvm.org.apache.hadoop.fs.Path
fs = filesystem.get(spark._jsc.hadoopConfiguration())

spark.conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.conf.set("fs.s3a.access.key", auth.get('AWS_ACCESS_KEY_ID'))
spark.conf.set("fs.s3a.secret.key", auth.get('AWS_SECRET_ACCESS_KEY'))

df = (spark.readStream.schema(new_schema)
      .format(file_type)
      .option("header", file_header)
      .option("maxFilesPerTrigger", 1)
      .option("maxFilesPerBatch", 1)
      .option("timestampFormat", timestamp_format)
      .load(my_path)
     )
0

There are 0 best solutions below