High Availability of Name Node in Hadoop 2.x

HDFS Name Node High Availability in Hadoop 2.x

            HDFS High Availability feature addresses the SPOF problem by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.

               NAMENODE HA WITH SHARED STORAGE AND ZOOKEEPER

 

HA2

HA CLUSTER :

            In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

            In order for the Standby node to keep its state synchronized with the Active node, the current implementation requires that the two nodes both have access to a directory on a shared storage device .

         During a failover, the fencing process is responsible for cutting off the previous Active’s access to the shared edits storage. This prevents it from making any further edits to the namespace, allowing the new Active to safely proceed with failover.

ZOOKEEPER:

            ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, and configuration maintenance. In addition, ZooKeeper can be used for event notification, locking, and as a priority queue mechanism.

ZooKeeper brings these key benefits:

  • Fast.
  • Reliable.
  • Simple.
  • Ordered       

ZKFailoverController (ZKFC)

         The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which run a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  • Health monitoring
  • ZooKeeper session management
  • ZooKeeper-based election

Apache Hadoop High Availability Cluster Configuration

Step – 1

Software Requirements :

      Download the Apache Hadoop from Apache Hadoop site. Please Check for                     Apache Hadoop Website if the link is broken. Please check for current                           version in the Apache Hadoop

                      http://www.us.apache.org/dist/hadoop/common/hadoop-2.6.0/

      Download Java from Oracle site.

                     http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-                      1880260.html

      Download the Apache Zookeeper from Apache Zookeeper site. Please Check               for Apache Zookeeper Website if the link is broken. Please check for current               version in the Apache Zookeeper

                     http://www.us.apache.org/dist/zookeeper/zookeeper-3.4.5/

Step – 2

          Setting  Host Name for IP Address (Setup in all the nodes of the cluster)

(As your machine gets started, it needs to know the mapping of some hostnames to IP addresses before DNS can be referenced. This mapping is kept in the /etc/hosts file. In the absence of a name server, any network program on your system consults this file to determine the IP address that corresponds to a host name)

           $sudo vi /etc/hosts 

1

Step – 3

           SSH  Configuration

(The Hadoop core uses Shell (SSH) to launch the server processes on the slave nodes. It requires password-less SSH connection between the master and all the slaves and the secondary machines. We need a password-less SSH in a Fully-Distributed environment because when the cluster is LIVE and running in Fully Distributed environment, the communication is too frequent. The job tracker should be able to send a task to task tracker quickly. If ssh is not passwordless, you have to go on each individual machine and start all the processes there, manually)

           Generate keys using ssh-keygen command in all nodes.

2

3

         (Below command to Login without asking for Password )

4

         Copy From Master Node to Standby and Slave nodes in id_rsa.pub key

(The ssh-copy-id command copies the public key of your default identity to the remote host. The default identity is your “standard” ssh key)

5 6

          Login & Check for connection from Master node to slave Nodes

7

8

Step – 4

          Zookeeper Configuration in all the nodes in the cluster:

a. Untar zookeeper-3.3.6.tar.gz
b. Change the directory to conf
c. Create a new file zoo.cfg

And add the below content

9

(These are the IDs and locations of all servers in the ensemble, the ports on which they communicate with each other)

           Create below Directory(data & logs) in all the nodes of the cluster

10

           master$vi  /home/dd/zookeeper/data/myid

a. 1 (just type 1)
b. save and exit (:wq)

slave1$vi  /home/dd/zookeeper/data/myid

a. 1 (just type 1)
b. save and exit (:wq)

slave2$vi  /home/dd/zookeeper/data/myid

a. 1 (just type 1)
b. save and exit (:wq)

Start Zookeeper  in all the nodes of the cluster.

Master:13

14

Slave1:s1

16

Slave2:s2

17

(If above commands are successful, Zookeeper Runs Perfectly)

Step – 5

Apache Hadoop Configuration in all the nodes of the cluster:

$tar  -zxvf  hadoop-2.6.0.tar.gz

Apache Hadoop cluster main configuration files are shown as below

·  core-site.xml
·  hdfs-site.xml
·  mapred-site.xml
·  yarn-site.xml
·  hadoop-env.sh
·  yarn-env.sh
·  mapred-env.sh
·  slaves

$vi  hadoop-2.6.0/etc/hadoop/core-site.xml18

$vi  hadoop-2.6.0/etc/hadoop/hdfs-site.xml19

20 21

$vi  hadoop-2.6.0/etc/hadoop/yarn-site.xml22

$vi  hadoop-2.6.0/etc/hadoop/mapred-site.xml.template23

$vi  hadoop-2.6.0/etc/hadoop/hadoop-env.sh24

$vi  hadoop-2.6.0/etc/hadoop/yarn-env.sh24

$vi  hadoop-2.6.0/etc/hadoop/mapred-env.sh24

$vi  hadoop-2.6.0/etc/hadoop/slaves

From /etc/hosts (ipaddress Name)
master
slave1
slave2

Step – 6

$vi  .bashrcbashimage

Step – 7

Start journal node in all nodes of the cluster (master,slave1 & slave2)

$hadoop-daemon.sh  start  journalnode

Format Zookeeper file system in Master Node

$hdfs zkfc -formatZK25

Format Namenode in Master Node

Master$hdfs  namenode -format26

Master$hadoop-daemon.sh start namenode27

Slave1$hdfs namenode -bootstrapStandby    (Deployment)28

Stop & Start Hadoop from Master Node

$stop-all.sh
$start-all.sh

Master:32

Slave1:33

Slave2:34

Writing Data into HDFS cluster:35

 

———————————-

Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/