Multi Node Installation in Cassandra

Hi Cassandraobservers,
This article will give you steps that should be done for making a Multi Node Cassandra cluster. We just made a cluster with two machines. Follow the steps in this material and get a two node cassandra cluster.

Cassandra_multinode_installation

Machine – 1
===========

Download (apache-cassandra and Java)

1. We need to download cassandra. Try to use below link to download it or use wget. If you need latest version then go to cassandra official web Site

wget http://mirror.cc.columbia.edu/pub/software/apache/cassandra/2.1.4/apache-cassandra-2.1.4-bin.tar.gz

2. We need jdk. Use below link to download jdk. If the link is broken then Please check for current version in the Oracle Site.

wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz?AuthParam=1429605646_f8f03b22c349cc19017fbcbe46ea8c97

3. Untar the downloaded cassandra and downloaded jdk, then set javahome and cassandra path in bashrc

tar -zxvf apache-cassandra-2.1.4
tar -zxvf jdk1.7.0_45.tar.gz

.bashrc
.bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences.

$ vi .bashrc
export JAVA_HOME=/home/bigdata/jdk1.7.0_79
export CASSANDRA_HOME=/home/bigdata/apache-cassandra-2.1.4
export PATH=$HOME/bin:$JAVA_HOME/bin:$CASSANDRA_HOME/bin:$PATH

$ source .bashrc
This command is used to execute the .bashrc. because it is a non interactive login.

sudo vi /etc/hosts

The hosts file is a computer file used by an operating system to map hostnames to IP addresses. The hosts file is a plain text file, and is conventionally named hosts.

10.0.0.7 datadotz_node1
10.0.0.9 datadotz_node2

$ cd apache-cassandra-2.1.4

Important configuration parameters for multinode cassandra.

Cluster_name : The name of the cluster; used to prevent machines in one logical cluster from joining another. All nodes participating in a cluster must have the same value.

initial_token : Used in the single-node-per-token architecture, where a node owns exactly one contiguous range in the ring space.

seeds : A list of comma-delimited hosts (IP addresses) to use as contact points when a node joins a cluster. Cassandra also uses this list to learn the topology of the ring. When running multiple nodes, you must change the – seeds list from the default value ( 127.0.0.1 ). In multiple data-center clusters, the – seeds list should include at least one node from each data center (replication group). See Initializing a multiple node cluster (single data center) and Initializing a multiple node cluster (multiple data centers).

listen_address : The IP address or hostname that other Cassandra nodes use to connect to this node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts , or DNS. Do not specify 0.0.0.0.

rpc_port : The port for the Thrift RPC service, which is used for client connections.

rpc_address : The listen address for client connections (Thrift remote procedure calls).

endpoint_snitch: Sets which snitch Cassandra uses for locating nodes and routing requests. It must be set to a class that implements IEndpointSnitch.

Create Directories

Create three directories in your apache cassandra home folder as given below.
data : The directory location where table data (SSTables) is stored.
Commitlog : The directory where the commit log is stored. For optimal write performance, it is recommended the commit log be on a separate disk partition (ideally, a separate physical device) from the data file directories.
saved_caches :The directory location where table key and row caches are stored.(Look below)

1.data
2.commitlog
3.saved_caches

After creating the directories edit the cassandra.yml (configuration file) as given below

$ vi conf/cassandra.yaml

data_file_directories:
- /home/bigdata/hadoop/apache-cassandra-2.1.4/data

# commit log
commitlog_directory: /home/bigdata/apache-cassandra-2.1.4/commitlog

# saved caches
saved_caches_directory: /home/bigdata/apache-cassandra-2.1.4/saved_caches

cluster_name: ‘Datadotz Cluster’

initial_token: 0

seeds: “datadotz_node1,datadotz_node2″

listen_address: datadotz_node1

rpc_address: datadotz_node1

rpc_port: 9160

endpoint_snitch: RackInferringSnitch

$ bin/cassendra -f
—————————————————————————————————-
Machine – 2
===========

Download:
$ wget http://mirror.cc.columbia.edu/pub/software/apache/cassandra/2.1.4/apache-cassandra-2.1.4-bin.tar.gz

$ wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz?AuthParam=1429605646_f8f03b22c349cc19017fbcbe46ea8c97

$ tar -zxvf apache-cassandra-2.1.4

$ vi .bashrc

export JAVA_HOME=/home/bigdata/jdk1.7.0_79
export CASSANDRA_HOME=/home/bigdata/apache-cassandra-2.1.4
export PATH=$HOME/bin:$JAVA_HOME/bin:$CASSANDRA_HOME/bin:$PATH

$ source .bashrc

sudo vi /etc/hosts

10.0.0.7 datadotz_node1
10.0.0.9 datadotz_node2

$ cd apache-cassandra-2.1.4

Create Directories

1.data
2.commitlog
3.saved_caches

$ vi conf/cassandra.yaml

data_file_directories:
- /home/bigdata/hadoop/apache-cassandra-2.1.4/data

# commit log
commitlog_directory: /home/bigdata/apache-cassandra-2.1.4/commitlog

# saved caches
saved_caches_directory: /home/bigdata/apache-cassandra-2.1.4/saved_caches

cluster_name: ‘Datadotz Cluster’

initial_token: 0

seeds: “datadotz_node1,datadotz_node2″

listen_address: datadotz_node2

rpc_address: datadotz_node2

rpc_port: 9160

endpoint_snitch: RackInferringSnitch

$ bin/cassendra -f
—————————————————————————————————-
Machine – 1
———–
Start your cassandra cql shell with the host name datadotz_node1.Create a keyspace (database) datadotzdb with column family (table) name as patient with has five column

$ bin/cqlsh datadotz_node1

cqlsh> CREATE KEYSPACE datadotzdb WITH REPLICATION = {‘class’ :’SimpleStrategy’, ‘replication_factor’: 3};

cqlsh> select * from system.schema_keyspaces;

cqlsh> create table datadotzdb.patient(sno int primary key,pname varchar,drug varchar,gender varchar,amt int);

cqlsh> insert into datadotzdb.patient(sno,pname,drug,gender,amt)values(1,’saravanan’,'avil’,'male’,200);

cqlsh> insert into datadotzdb.patient(sno,pname,drug,gender,amt)values(2,’Ram’,'avil’,'male’,400);

cqlsh> select * from datadotzdb.patient;

—————————————————————————————————-
Machine – 2
———–
Start your cassandra cql shell with the host name datadotz_node2 and execute the select query to see the data in patient column family(table)

$ bin/cqlsh datadotz_node2
cqlsh> select * from datadotzdb.patient;

———————————-

Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/