Basic Read and Write using Apache Spark & Cassandra

Spark provides a more inclusive framework allowing for multiple analytics processing options including:

a) fast interactive queries

b) streaming analytics

c) graph analytics and

d) machine learning.

Cassandra is a distributed database based on Google BigTable and Amazon’s Dynamo. Like other Big Data databases, it allows for malleable data structures. One of Cassandra’s coolest features is the fact that it scales in a predictable way – every single node on a Cassandra cluster has the same role and works in the same way. No single node becomes a bottleneck for overall cluster performance, and there is no single point of failure. And that is pretty wonderful.

Spark Cassandra Integration is a wonderful combination for many level processing . Quick start is all about Spark Cassandra connectivity.


Pre-requisites of Apache Spark-Cassandra Connectivity






1. To set in .bashrc


(Note : This can be set in .bash_profile too)

2. Apache Spark Installation

(Kindly refer our for Spark Installation)

(Note : Use above mentioned versions)

3. Apache Cassandra Standalone Quick Start

(Kindly refer our for Cassandra Installation)

(Note : Use above mentioned versions)

4. Steps for Configuration

a. Copy all the apache-cassandra-2.2.3/lib jars to spark-1.5.1-bin-hadoop2.6/lib + cassandra-driver-core-2.1.5.jar (to be downloaded and added)

b. Go to spark-1.5.1-bin-hadoop2.6/conf/

Rename to and add the below


c. Start Cassandra and Spark , using jps command, check the running daemons


5. Creation of Key space and Table in Cassandra

Create a key space and Table needed for this quick start in Cassandra


Here, Patient Dataset is taken as Input to the Table


use COPY command to load data into Cassandra


6. Enter into spark-shell

Move the downloaded “spark-cassandra-connector_2.10-1.5.0-M1.jar” to spark-1.5.1-bin-hadoop2.6

bin/spark-shell –jars spark-cassandra-connector_2.10-1.5.0-M1.jar

Run the below:

6.1 Configuring a new sc

<br />import org.apache.spark.SparkContext<br />import org.apache.spark.SparkContext._<br />import org.apache.spark.SparkConf<br />sc.stop<br />val conf = new SparkConf(true).set("", "")<br />val sc = new SparkContext("local[2]", "test", conf)<br />

6.2 Accessing Cassandra

<br />import com.datastax.spark.connector._<br />val rdd = sc.cassandraTable("patient", "patientdata")<br />println(rdd.first)<br />

6.3 Inserting data in Cassandra

<br />import com.datastax.spark.connector._<br />import com.datastax.spark.connector.cql._<br />val c = CassandraConnector(sc.getConf)<br />c.withSessionDo ( session =&amp;gt; session.execute("insert into patient.patientdata (sno,name,drug,gender,amt)values (11,'john','avil','male','100')"))<br /><br />

Reference images

a. Creating spark context for Cassandra


b. Data Insertion in Cassandra


c. Now, check CQLSH for the newly inserted records



Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit