Skip to main content

Advances, Systems and Applications

Table 2 Deadline-aware schedulers in homogeneous clusters and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference

Year

Key Ideas

Advantages

Disadvantages

Comparison Algorithms

Evaluation Techniques

Gao et al. [33]

2022

Deadline aware preemptive job scheduling in Hadoop Yarn clusters

Minimizing the job deadline misses. Improving the resource utilization

Not considering heterogeneous clusters. No ensuring the deadline of all jobs was met

Capacity Scheduler, EDF, PDSonQueue (PDQ)

Implementation: Hadoop YARN cluster with five nodes

Cheng et al. [34]

2018

RDS: deadline-aware MapReduce job scheduling with dynamic resource availability

Reducing job deadline misses

Reducing job completion time

Predicting future resource availability

Using flexible deadline time

Not considering heterogeneity

Not considering data locality

Fair, Earliest Deadline First (EDF)

Implementation: on a Hadoop cluster

Kao et al. [35]

2016

DamRT: Using dispatcher and schedulability test for deadline guarantees

Increasing the number of jobs that meet the deadlines

Increasing data locality

Minimizing response time of tasks

Using a static manner to divide the job deadline into task deadlines

Not considering heterogeneity

FIFO, EDF, MTSD

Simulation:

CloudSimRT simulator

Verma et al. [36]

2012

Using EDF policy for job ordering in the processing queue

Using MinEDF and MinEDF-WC mechanisms for allocating a tailored number of Map and reduce slots to each job

Increasing resource utilization

Reducing job completion time

Minimizing the number of deadline-over jobs

Not considering the heterogeneity

EDF, MinEDF, MinEDF-WC

Implementation:

Hadoop cluster with 66 nodes

Simulation: SimMR simulation environment

Phan et al. [37]

2011

Using EDF/MR policy to minimize miss rate

Using EDF/TD policy to minimize total tardiness

Minimizing deadline miss ratio and tardiness

Considering data locality

Not considering the heterogeneity

FIFO, Fair, Capacity

Implementation:

Hadoop cluster (21 nodes) on Amazon

EC2

Kc et al. [38]

2014

Using a schedulability test to determine whether the job can be completed within the specified deadline or not

Maximizing the number of jobs that can be run while satisfying the deadlines

Increasing resource utilization

Nodes are homogeneous data distribution is uniform

No comparison, different input datasets in different possible scenarios

Implementation: in virtualized cluster and physical cluster

Teng et al. [39]

2014

Pausing between Map and reduce stages

Using a schedulability bound to test the schedulability of real-time tasks

Meeting the job deadlines

Increasing cluster utilization

Providing deadline guarantees for the jobs

Not considering heterogeneity

Not considering dependent tasks, periodic tasks, non-preemptive execution

FIFO, RM

Implementation:

on the Hadoop testbed

Implementation:

on Hadoop 1.1.0

Simulation:

SimMapReduce

simulation environment

Wang et al. [40]

2015

SAMES: Using a scheduling algorithm based on the most effective sequence (SAMES) for deadline-constraint jobs

Completing the jobs before their deadlines

Increasing system utilization

Not considering heterogeneity

Not considering the differences between the Map and Reduce tasks

DC, MinEDF-WC

Implementation:

on Hadoop (1.0.4)

Dong et al. [41]

2011

Using a sampling-based approach, and resource allocation model to design a deadline scheduler, for real-time jobs

Using two-level scheduler to schedule mixed real-time and non-real-time jobs

Increasing system utilization

Assigning minimum number of resources to real-time jobs

Providing deadline guarantee for the real-time jobs

No discussing the hardware heterogeneity

Using a static manner to divide the job deadline into task deadlines

Polo’s model, Kc’s model

Implementation:

on a eleven-node cluster configured with the hadoop-0.20.2 release

Verma et al. [42]

2011 b

Using scaling technique and performance models for estimate the resources required while meet SLOs

Estimating required resources to meet job deadlines

Increasing system utilization

Not considering heterogeneity

No comparison, different input datasets in different possible scenarios

Implementation:

on a Hadoop cluster