Apache Zeppelin: A Quick start with Apache Spark

Apache  Zeppelin is a web-based notebook that enables interactive data analytics against multiple language basket. Currently Apache  Zeppelin is in incubation.For more details, Please refer  Zeppelin official site  :  https://zeppelin.incubator.Apache .org/

Current version   :   zeppelin-0.5.5-incubating-bin-all

This Article provides a Installation of Apache Zeppelin against Apache Spark.


1. Pre-requisites of Apache  Zeppelin

b.npm(NPM is NodeJs  package manager)
c.Java 1.7 or later
d.Apache spark

1.1 Installation of Pre-requisites
<This quick start  is written for ubuntu OS. For other OS please refer similar commands or write to us>
sudo apt-get update
sudo apt-get install git
sudo apt-get install npm    (NPM is NodeJs package manager )

1.2.Download the Java from Oracle. Please Check for Oracle Website if the link is broken. Please check for current version in the Oracle
Download jdk 1.7 or higher because java is the dependency for zeppelin.

1.3.Set the Environment path of java in .bashrc
export JAVA_HOME=/home/datadotz/jdk1.7.0_45
Note : Even you can set the same in .bash_rc or /etc/profile or /etc/environment or .bash_profile

2. Installation of Apache Zeppelin

2.1 Download Apache Zeppelin

Download the Zeppelin from Zeppelin site. Please Check for Zeppelin Website if the link is broken. Please check for current version in the Zeppelin
http://www.us.Apache .org/dist/incubator/zeppelin/0.5.5-incubating/zeppelin-0.5.5-incubating-bin-all.tgz

2.2.Configure  variable as needed for Zeppelin


Note : Also can use the latest version of spark -1.5.2
command to startzeppelin-daemon.sh  start
This will start Zeppelin server daemon can check the status by using command jps (java process status)
Please check  web page by default Zeppelin UI runs on port 8080
#— If needed, please change the UI port number in zeepelin-site.xml –#

3.Installation of Apache Spark

Please Refer  to chennaihug.org for spark installation

Start Apache Spark  - sbin/start-all.sh  

4.List of all Daemons

This figure shows the list of all daemons running  . If just using zeppelin with spark then no need of Hadoop to up and run.

5.Web UI

a.Apache Zepplin default startup page

This figure shows the default web UI of Zeppelin which runs on IP address localhost and the port on 8082(in my case I changed the port number from 8080 to 8082)

b.Notebook in Apache Zeppelin Fig3
This is how you have to create  a note book . Simply to say note book is like a editor where you can run the commands and scripts.

c.Use Apache  spark Interpreter

An example to load a file from my Linux machine  into spark . And testing  the same  by running a wordcount.


6. Analysis of drug data using Apache  spark SQL  and Zeppelin

Load the data and create a schema and temporary table
Table name : customers
Input Data set : datagen_10.txt   (drug data set)


Query : To find the total amount of the drugs

“select   drug  , sum(amt) from customers group by  drug”

The output of the query is to return all the records based of sum of amount group by drug

7.Output in Various forms

Tabular View - This web UI shows the tabular view of output


Bar Chart – Below  web UI shows the bar chart representation


Pie Chart  - Below web UI shows the same output in pie chart


Some other Chart types





Hope you enjoyed this quick start !!!


Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/