Spark Master/Slave installation in Multi Node

Hi sparkviewers,

This article will provide you a knowledge about how to install Spark Master and Slaves Multi Node Installation. In this we just shown you a example of making Spark multinode cluster with in three machines. One machine act as master other two machines acts as workers. Try this and generate a successful multinode Spark cluster

Spark_master_slave_multinode_installation

Machine – 1
===========
Set in etc/hosts

$sudo vi /etc/hosts

10.0.0.7 datadotz_master
10.0.0.9 datadotz_worker1
10.0.0.10 datadotz_worker2

Download
========
1. Download spark from the below link. If you are using linux os then just use the below wget command to get the spark version. If you need latest version refer official Spark Site
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0.tgz

2. To start spark in your machine we need scala. To download scala use the below wget command. And wait untill scala gets downloaded in your machine. If you need latest version of scala enter in to official scala Site
wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

3. After getting the spark-1.3.0.tgz and scala-2.10.4.tgz on your machine just untar both the file.
$tar -zxvf spark-1.3.0.tgz
$tar -zxvf scala-2.10.4.tgz

.bashrc
.bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences.

vi .bashrc
export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_HOME=/home/bigdata/spark-1.3.0
export PATH=$HOME/bin:$SCALA_HOME/bin:$PATH
source .bashrc

Install git in your machine and follow the remaining steps

git Install
=======
sudo apt-get install git
cd spark-1.3.0
sbt/sbt assembly
cd spark-1.3.0
cd conf
cp spark-env.sh.template spark-env.sh

$vi spark-env.sh

export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/home/bigdata/sparkdata
export SPARK_MASTER_IP=datadotz_master

$cp slaves.template slaves(copy the slaves.template file to another file named as slaves)

$vi slaves

datadotz_master
datadotz_worker1
datadotz_worker2

$vi spark-defaults.conf.template

spark.master spark://datadotz_master:7077
—————————————————————————————————

Machine – 2
===========

Set in etc/hosts

$sudo vi /etc/hosts

10.0.0.7 datadotz_master
10.0.0.9 datadotz_worker1
10.0.0.10 datadotz_worker2

Download
========
Download spark and scala in your machines.
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0.tgz
wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

After downloading just untar both the files. And then set path for spark and scala on .bashrc

$tar -zxvf spark-1.3.0.tgz
$tar -zxvf scala-2.10.4.tgz

vi .bashrc
export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_HOME=/home/bigdata/spark-1.3.0
export PATH=$HOME/bin:$SCALA_HOME/bin:$PATH
source .bashrc

Install git in your machine and follow the remaining steps

git Install
===========
sudo apt-get install git
cd spark-1.3.0
sbt/sbt assembly
cd spark-1.3.0
cd conf

cp spark-env.sh.template spark-env.sh

vi spark-env.sh
export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/home/bigdata/sparkdata
export SPARK_MASTER_IP=datadotz_master

cp slaves.template slaves

vi slaves

datadotz_master
datadotz_worker1
datadotz_worker2

vi spark-defaults.conf.template
spark.master spark://datadotz_master:7077
————————————————————————————————–

Machine – 3
===========

Set in etc/hosts

$sudo vi /etc/hosts

10.0.0.7 datadotz_master
10.0.0.9 datadotz_worker1
10.0.0.10 datadotz_worker2

Download
========

Download spark and scala in your machines.

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0.tgz
wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

After downloading just untar both the files. And then set path for spark and scala on .bashrc

$tar -zxvf spark-1.3.0.tgz
$tar -zxvf scala-2.10.4.tgz

vi .bashrc
export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_HOME=/home/bigdata/spark-1.3.0
export PATH=$HOME/bin:$SCALA_HOME/bin:$PATH
source .bashrc

Install git in your machine and follow the remaining steps

git Install
===========
sudo apt-get install git
cd spark-1.3.0
sbt/sbt assembly
cd spark-1.3.0
cd conf

cp spark-env.sh.template spark-env.sh

vi spark-env.sh
export SCALA_HOME=/home/bigdata/scala-2.10.4
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/home/bigdata/sparkdata
export SPARK_MASTER_IP=datadotz_master

cp slaves.template slaves

vi slaves

datadotz_master
datadotz_worker1
datadotz_worker2

vi spark-defaults.conf.template
spark.master spark://datadotz_master:7077
—————————————————————————————————-

Do this in Machine-1
====================
$ cd spark-1.3.0
$ sbin/start-master.sh
$ sbin/start-slaves.sh
$ jps

master
worker
worker

In Machine-2
============
$ jps

worker
worker

In Machine-3
============
$ jps

worker
worker

localhost:8080
(Check Workers)

———————————-

Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/