HDFS Name Node High Availability in Hadoop 2.x
HDFS High Availability feature addresses the SPOF problem by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.
NAMENODE HA WITH SHARED STORAGE AND ZOOKEEPER
HA CLUSTER :
In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
In order for the Standby node to keep its state synchronized with the Active node, the current implementation requires that the two nodes both have access to a directory on a shared storage device .
During a failover, the fencing process is responsible for cutting off the previous Active’s access to the shared edits storage. This prevents it from making any further edits to the namespace, allowing the new Active to safely proceed with failover.
ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, and configuration maintenance. In addition, ZooKeeper can be used for event notification, locking, and as a priority queue mechanism.
ZooKeeper brings these key benefits:
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which run a NameNode also runs a ZKFC, and that ZKFC is responsible for:
Step – 1
Software Requirements :
Download the Apache Hadoop from Apache Hadoop site. Please Check for Apache Hadoop Website if the link is broken. Please check for current version in the Apache Hadoop
Download Java from Oracle site.
Download the Apache Zookeeper from Apache Zookeeper site. Please Check for Apache Zookeeper Website if the link is broken. Please check for current version in the Apache Zookeeper
Step – 2
Setting Host Name for IP Address (Setup in all the nodes of the cluster)(As your machine gets started, it needs to know the mapping of some hostnames to IP addresses before DNS can be referenced. This mapping is kept in the /etc/hosts file. In the absence of a name server, any network program on your system consults this file to determine the IP address that corresponds to a host name)
$sudo vi /etc/hosts
Step – 3
(The Hadoop core uses Shell (SSH) to launch the server processes on the slave nodes. It requires password-less SSH connection between the master and all the slaves and the secondary machines. We need a password-less SSH in a Fully-Distributed environment because when the cluster is LIVE and running in Fully Distributed environment, the communication is too frequent. The job tracker should be able to send a task to task tracker quickly. If ssh is not passwordless, you have to go on each individual machine and start all the processes there, manually)
Generate keys using ssh-keygen command in all nodes.
(Below command to Login without asking for Password )
Copy From Master Node to Standby and Slave nodes in id_rsa.pub key
ssh-copy-id command copies the public key of your default identity to the remote host. The default identity is your “standard” ssh key)
Login & Check for connection from Master node to slave Nodes
Step – 4
Zookeeper Configuration in all the nodes in the cluster:
a. Untar zookeeper-3.3.6.tar.gz
b. Change the directory to conf
c. Create a new file zoo.cfg
And add the below content
(These are the IDs and locations of all servers in the ensemble, the ports on which they communicate with each other)
Create below Directory(data & logs) in all the nodes of the cluster
master$vi /home/dd/zookeeper/data/myida. 1 (just type 1) b. save and exit (:wq)
slave1$vi /home/dd/zookeeper/data/myida. 1 (just type 1) b. save and exit (:wq)
slave2$vi /home/dd/zookeeper/data/myida. 1 (just type 1) b. save and exit (:wq)
Start Zookeeper in all the nodes of the cluster.
(If above commands are successful, Zookeeper Runs Perfectly)
Step – 5
Apache Hadoop Configuration in all the nodes of the cluster:
$tar -zxvf hadoop-2.6.0.tar.gz
Apache Hadoop cluster main configuration files are shown as below· core-site.xml · hdfs-site.xml · mapred-site.xml · yarn-site.xml · hadoop-env.sh · yarn-env.sh · mapred-env.sh · slaves
$vi hadoop-2.6.0/etc/hadoop/slavesFrom /etc/hosts (ipaddress Name) master slave1 slave2
Step – 6
Step – 7
Start journal node in all nodes of the cluster (master,slave1 & slave2)
$hadoop-daemon.sh start journalnode
Format Zookeeper file system in Master Node
Format Namenode in Master Node
Stop & Start Hadoop from Master Node$stop-all.sh $start-all.sh
Article written by DataDotz Team
DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.
Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/