How can I read Cassandra data using JDBC from pySpark?

238 Views Asked by murzade At 12 September 2022 at 06:50

In order to parallelize the read operation and read with more than one executor. Rather than the following read code, I want to read with JDBC.

hosts ={"spark.cassandra.connection.host":'node1_ip,node2_ip,node3_ip',
   "table":"ex_table","keyspace":"ex_keyspace"}
data_frame=sqlContext.read.format("org.apache.spark.sql.cassandra") \
  .options(**hosts).load()

Original Q&A

There are 1 best solutions below

Erick Ramirez On 12 September 2022 at 12:47

DataStax provides a JDBC driver for Apache Spark which allows you to connect to Cassandra from Spark using a JDBC connection.

The JDBC driver is available to download from the DataStax Downloads site.

See the instructions for Installing the Simba JDBC driver. Additionally, there is also a User Guide for configuring the driver with some examples. Cheers!

How can I read Cassandra data using JDBC from pySpark?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in JDBC

Related Questions in CASSANDRA

Related Questions in CASSANDRA-JDBC

Trending Questions

Popular # Hahtags

Popular Questions