Configured truststore path in Spark Cassandra connector returns NoSuchFileException

240 Views Asked by At

I'm using Datastax Spark Cassandra Connector for my PySpark application (in .master("yarn") session mode).

But our security rules require that I should connect to Cassandra only with SSL connection. As I was told, our Cassandra does not verify its clients (only checks username/password pairs), but we, the clients, should verify Cassandra with SSL certificates.

I was given a path to JKS trust store that and wanted to pass it to Datastax Spark Connector.

I set the following SparkSession parameters:

spark.cassandra.connection.ssl.enabled: true
spark.cassandra.connection.ssl.clientAuth.enabled: false
spark.cassandra.connection.ssl.trustStore.path: "/home/tech_profiling/mdp-cassandra-keystore.jks"
spark.cassandra.connection.ssl.trustStore.password: "cassandra"
spark.cassandra.auth.username: "my_super_dooper_gigachad_user"
spark.cassandra.auth.username: "my_mega_password"

I checked that this file /home/tech_profiling/mdp-cassandra-keystore.jks is really present on my Spark cluster nodes. But anyway it gives me the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o486.save.
: java.nio.file.NoSuchFileException: /home/tech_profiling/mdp-cassandra-keystore.jks
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.getKeyStore(CassandraConnectionFactory.scala:78)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.trustStore$lzycompute$1(CassandraConnectionFactory.scala:92)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.trustStore$1(CassandraConnectionFactory.scala:91)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.maybeCreateSSLOptions(CassandraConnectionFactory.scala:97)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:62)
    at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:131)
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:159)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
    at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:79)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
    at com.datastax.spark.connector.rdd.partitioner.dht.TokenFactory$.forSystemLocalPartitioner(TokenFactory.scala:98)
    at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:276)
    at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:83)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:136)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:160)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:157)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Has anyone already encountered with such a problem?

1

There are 1 best solutions below

0
Erick Ramirez On

That exception indicates that the Spark executors are not able to access the file.

We recommend using the spark.files option ( or --files) since Spark will distribute the specified files to all the executors' working directory.

For example, you would pass /home/tech_profiling/mdp-cassandra-keystore.jks to spark.files then configure the truststore path with:

spark.cassandra.connection.ssl.trustStore.path: keystore.jks

Cheers!