Single Node Installation for Hadoop-1.x

As part of the this tutorial, we will be setting up a single node Hadoop-1.X Cluster in your machine either laptop or desktop. Try below instructions and get pseudo distributed hadoop cluster in your machine and practice hadoop on your own.

Hadoop-1x_singlenode_installation

1. Download the Hadoop tar from below link. You can also take the link from Apache Hadoop Site. Below is the link for hadoop-1.2.1. Current Versions are hadoop-2.6.0  and hadoop-1.2.1 as of writing this wiki(April 23, 2015). Please check for current version in the hadoop Site.
wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz

2. Download the Java from Oracle. Please Check for Oracle Website if the link is broken. Please check for current version in the Oracle Site.
wget http://download.oracle.com/otn/java/jdk/6u45-b06/jdk-6u45-linux-x64.bin?AuthParam=1428646766_2456c1516fcf63e9734ff30e51667a2b

3. Unpack the Comparisons
$tar -zxvf hadoop-1.2.1.tar.gz
$sh jdk-6u45-linux-x64.bin

4. Set JAVA PATH in Linux Environment. Edit .bashrc and add below 2 lines
$vi .bashrc
export JAVA_HOME=/home/bigdata/jdk1.6.0_45
export PATH=$HOME/bin:$JAVA_HOME/bin:$PATH

Execute the .bashrc File to make the changes in effect immediately for the current ssh session
$source .bashrc
————————————————————————————————————–

5. Modify Hadoop Configuration Files.Below are the files required for respective daemons.

NAMENODE core-site.xml
JOBTRACKER mapperd-site.xml
SECONDARYNAMENODE mastors
DATANODE
TASKTRECKER slaves

Ports used by Hadoop Daemons

Hadoop Daemons RPC Port >WEB UI
NameNode 50000 50070
SecondaryNameNode 50090
DataNode 50010 50075
JobTracker 50001 50030
TaskTracker 50020 50060

Hadoop Configuration

$cd hadoop-1.2.1
$cd conf

$vi core-site.xml
<!– This conf denotes the filesystem. Also which ip & port for NN to bind –>
<property>
<name>fs.default.name</name>
<value>hdfs://datadotz_node1:50000</value>
</property>

$vi mapred-site.xml
<!– This conf denotes the Mapreduce –>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://datadotz_node1:50001</value>
</property>

$vi hdfs-site.xml
<!– This conf denotes the storage part of namenodes metadata and datanodes physical blocks –>
<property>
<name>dfs.name.dir</name>
<value>/home/bigdata/hadoop-dir/name-dir</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>/home/bigdata/hadoop-dir/data-dir</value>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

$vi hadoop-env.sh

export JAVA_HOME=/home/bigdata/jdk1.6.0_45

$vi masters

localhost

$vi slaves

localhost
————————————————————————————————————–
Passwordless Authentication. If you use default script such as start-all.sh, stop-all.sh and other similar scripts, it needs to log(using ssh) into other machines from the machine where you are running the script. Typically we run it from NN machine. While logging in , every machine will ask for passwords. If you are having 10 node cluster, you are needed to enter password minimum of 10. To avoid the same, we create passwordless authentication. First we need to generate the ssh key and copy the public key into authorized keys of the destination machine.

Install the Openssh-server

$ sudo apt-get install openssh-server

Generate the ssh key

(manages and converts authentication keys)

$ cd
$ ssh-keygen -t rsa
$ cd .ssh
$ cat id_rsa.pub >> authorized_keys

Setup passwordless ssh to localhost and to slaves

$ ssh localhost (or) ipaddress

(Asking No Password )
————————————————————————————————————–

Format Hadoop NameNode

$cd
$cd hadoop-1.2.1
$bin/hadoop namenode -format

Start All Hadoop Related Services

$bin/start-all.sh

$ jps (Five Hadoop – java process status)

NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker

(Browse NameNode and JobTracker Web GUI )

NameNode : localhost:50070
JobTracker : localhost:50030

$ bin/stop-all.sh (Stop All Hadoop Related Services)

———————————-

Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/