Fair and Capacity Schedulers of Hadoop 1.x

HADOOP 1.2.1  SCHEDULERS – FAIR  & CAPACITY

Fair Scheduler:

            Fair scheduling is a method of assigning resources to jobs such that all jobs get, an equal share of resources. When there is a single job running, that job uses the entire cluster. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time. The fair scheduler was developed by Facebook.

            Fair sharing work with job priorities – the priorities are used as weights to determine the total compute time that each job gets. It organizes jobs into pools, and divides resources fairly between these pools and can limit the number of concurrent running jobs per user and per pool.

Fair Scheduler Configuration:

1.Add below property in hadoop-1.2.1/conf/mapred-site.xml

1

2

2. Add the following property in hadoop-1.2.1/conf/fair-scheduler.xml

3

4

Pool elements:

  • minMaps and minReduces, to set the pool’s minimum share of task slots.
  • maxMaps and maxReduces, to set the pool’s maximum concurrent task slots.
  • maxRunningJobs, to limit the number of jobs from the pool to run at once.
  • weight to share the cluster non-proportionally with other pools.
  • minSharePreemptionTimeout, the number of seconds the pool will wait before killing other  pools’ tasks if it is below its minimum share.

3. Start Hadoop cluster and run Scheduler in http://master:50030/scheduler

5

6

7

8

9

10

Capacity Scheduler:

            In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue’s capacity).

Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues. The capacity scheduler was developed by Yahoo.

Capacity Scheduler Configuration:

1.Add below property in hadoop-1.2.1/conf/mapred-site.xml

11

12

2. Add the following property in hadoop-1.2.1/conf/capacity-scheduler.xml

13

14

15

16

17

18

19

20

21

22

23

———————————-

Article written by DataDotz Team

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.

Note: DataDotz also provides classroom based Apache Kafka training in Chennai. The Course includes Cassandra , MongoDB, Scala and Apache Spark Training. For more details related to Apache Spark training in Chennai, please visit http://datadotz.com/training/