Spark 3 on Databricks 10.3 UnsatisfiedLinkError from snappy

113 Views Asked by At

I am trying to read some protocol buffer files and apparently - I am not 100% sure - these are compressed using snappy. The files are in a binary format.

I am running a notebook on Databricks using runtime version 10.4 LTS and Spark 3

sc.sequenceFile[NullWritable, BytesWritable](concatUris)
        .map(b => {
          val msg: Array[Byte] = b._2.copyBytes()
          val feed: a_feed = a_feed.parseFrom(msg)
          val properties = feed.toPMessage.value
            .map { case (key, value) =>
              key.name -> (value match {
                case i: PInt        => i.value
                case l: PLong       => l.value
                case s: PString     => s.value
                case d: PDouble     => d.value
                case f: PFloat      => f.value
                case b: PByteString => b.value
                case c: PBoolean    => c.value
                case e: PEnum       => e.value.name
                case other          => other.toString()
              })
            }

          (uid, properties + ("user_id" -> uid))
        })

A more extended stack trace:

Caused by: UnsatisfiedLinkError: org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawUncompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
    at org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
    at org.apache.hadoop.shaded.org.xerial.snappy.Snappy.uncompress(Snappy.java:551)
    at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressDirectBuf(SnappyDecompressor.java:267)
    at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:217)
    at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:92)

I have tried to install different version of org.xerial.snappy:snappy-java:<version>:jar but to no avail.

I can only install libraries on the Databricks cluster using the Compute>Libraries tab and uploading them and I am not sure if they're install throughout the cluster (driver+executors) or only on one of the two.

0

There are 0 best solutions below