How can I read Cassandra data using JDBC from pySpark?

238 Views Asked by At

In order to parallelize the read operation and read with more than one executor. Rather than the following read code, I want to read with JDBC.

hosts ={"spark.cassandra.connection.host":'node1_ip,node2_ip,node3_ip',
   "table":"ex_table","keyspace":"ex_keyspace"}
data_frame=sqlContext.read.format("org.apache.spark.sql.cassandra") \
  .options(**hosts).load()

How can I read Cassandra data using JDBC from pySpark?

1

There are 1 best solutions below

0
Erick Ramirez On

DataStax provides a JDBC driver for Apache Spark which allows you to connect to Cassandra from Spark using a JDBC connection.

The JDBC driver is available to download from the DataStax Downloads site.

See the instructions for Installing the Simba JDBC driver. Additionally, there is also a User Guide for configuring the driver with some examples. Cheers!