Skip to main content

Advances, Systems and Applications

Table 6 Resource-aware schedulers and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference

Year

Key Ideas

Advantages

Disadvantages

Comparison Algorithms

Evaluation Techniques

Aarthee et al. [67]

2023

Heuristic scheduling using bin packing MapReduce scheduler for heterogeneous workloads performance in big data

Increasing the resource utilization. Improving the makespan. The number of data nonlocal execution is low

Not considering data distribution. Not considering node failures

RWS, HMJS

Implementation:

Hadoop cluster with twelve nodes

Jeyaraj et al. [68]

2022

Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization

Improving resource utilization. Minimizing the makespan

No implementation

FS, RWS, HMJS

Simulation

Zhang et al. [69]

2015

PRISM: a phase-level resource-aware MapReduce scheduler

Improving resource utilization

Reducing job running time

Avoiding resource contention

Worse data locality

Not considering heterogeneity

Fair, Yarn

Implementation: in Hadoop 0.20.2

Rasooli et al. [70]

2014

Using an algorithm which classifies the jobs based on their requirements and finds an appropriate matching of resources and jobs in the system

Reducing average completion time

Improving fairness

Satisfying the required minimum shares

Increasing locality

No starvation for small job

Avoiding the sticky slot problem

Increasing scheduling overhead

No separating data intensive and computation intensive jobs in performing the classification

FIFO, Fair

Implementation:

Hadoop cluster with four nodes

Simulation:

MRSIM, a MapReduce simulator

Polo et al. [71]

2011

Using job profiling information to dynamically adjust the number of job slots and their placement

Improving resource utilization

Meeting completion time goals

Reducing makespan

Not considering heterogeneity

The task’s resource consumption is assumed to be stable during its lifetime

No support job priority

Fair

Implementation:

on a Hadoop cluster

Sharma et al. [72]

2012

MROrchestrator: Using fine-grained, dynamic, and coordinated resource allocation instead of slot-based resource allocation

Increasing resource utilization

Reducing job completion time

Dynamically identify resource bottlenecks, and resolve them

Not considering other resources like disk and network bandwidth

No evaluating on a large cloud

Neglecting the workload imbalance among tasks

Not considering heterogeneity

Mesos, NGM

Implementation:

on two 24-node physical and virtualized Hadoop clusters

Pastorelli et al. [73]

2015

Using size-based scheduling with aging

Decreasing system response times

Guarantying Fairness

Avoiding job starvation

Considering data locality

No implementation in large clusters

Fair

Implementation:

on a cluster composed of 20 Task Tracker worker machines

Tian et al. [74]

2011

Using cost model to find the optimal resource provisioning for different decision problems

Cost model fits well on four tested programs

Cost model has low error rates

Not conducting experiments in the public cloud

Not considering heterogeneity

No comparison, different input datasets in different possible scenarios

Implementation: in their in-house 16-node Hadoop cluster

Ghoneem et al. [75]

2017

Providing the classifier with information about jobs requirement and node capabilities

Increasing resources utilization

Minimizing

average completion time

Minimizing master node overhead

No starvation for small job

Avoiding the sticky slot problem

Considering heterogeneity

Finding content information is computationally expensive and time consuming

HP model, Starfish

Implementation:

on a cluster consisted of three nodes