MapReduce scheduling algorithms in Hadoop: a systematic study

Hedayati, Soudabeh; Maleki, Neda; Olsson, Tobias; Ahlgren, Fredrik; Seyednezhad, Mahdi; Berahmand, Kamal

doi:10.1186/s13677-023-00520-9

Journal of Cloud Computing

Advances, Systems and Applications

Table 2 Deadline-aware schedulers in homogeneous clusters and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference	Year	Key Ideas	Advantages	Disadvantages	Comparison Algorithms	Evaluation Techniques
Gao et al. [33]	2022	Deadline aware preemptive job scheduling in Hadoop Yarn clusters	Minimizing the job deadline misses. Improving the resource utilization	Not considering heterogeneous clusters. No ensuring the deadline of all jobs was met	Capacity Scheduler, EDF, PDSonQueue (PDQ)	Implementation: Hadoop YARN cluster with five nodes
Cheng et al. [34]	2018	RDS: deadline-aware MapReduce job scheduling with dynamic resource availability	Reducing job deadline misses Reducing job completion time Predicting future resource availability Using flexible deadline time	Not considering heterogeneity Not considering data locality	Fair, Earliest Deadline First (EDF)	Implementation: on a Hadoop cluster
Kao et al. [35]	2016	DamRT: Using dispatcher and schedulability test for deadline guarantees	Increasing the number of jobs that meet the deadlines Increasing data locality Minimizing response time of tasks	Using a static manner to divide the job deadline into task deadlines Not considering heterogeneity	FIFO, EDF, MTSD	Simulation: CloudSimRT simulator
Verma et al. [36]	2012	Using EDF policy for job ordering in the processing queue Using MinEDF and MinEDF-WC mechanisms for allocating a tailored number of Map and reduce slots to each job	Increasing resource utilization Reducing job completion time Minimizing the number of deadline-over jobs	Not considering the heterogeneity	EDF, MinEDF, MinEDF-WC	Implementation: Hadoop cluster with 66 nodes Simulation: SimMR simulation environment
Phan et al. [37]	2011	Using EDF/MR policy to minimize miss rate Using EDF/TD policy to minimize total tardiness	Minimizing deadline miss ratio and tardiness Considering data locality	Not considering the heterogeneity	FIFO, Fair, Capacity	Implementation: Hadoop cluster (21 nodes) on Amazon EC2
Kc et al. [38]	2014	Using a schedulability test to determine whether the job can be completed within the specified deadline or not	Maximizing the number of jobs that can be run while satisfying the deadlines Increasing resource utilization	Nodes are homogeneous data distribution is uniform	No comparison, different input datasets in different possible scenarios	Implementation: in virtualized cluster and physical cluster
Teng et al. [39]	2014	Pausing between Map and reduce stages Using a schedulability bound to test the schedulability of real-time tasks	Meeting the job deadlines Increasing cluster utilization Providing deadline guarantees for the jobs	Not considering heterogeneity Not considering dependent tasks, periodic tasks, non-preemptive execution	FIFO, RM	Implementation: on the Hadoop testbed Implementation: on Hadoop 1.1.0 Simulation: SimMapReduce simulation environment
Wang et al. [40]	2015	SAMES: Using a scheduling algorithm based on the most effective sequence (SAMES) for deadline-constraint jobs	Completing the jobs before their deadlines Increasing system utilization	Not considering heterogeneity Not considering the differences between the Map and Reduce tasks	DC, MinEDF-WC	Implementation: on Hadoop (1.0.4)
Dong et al. [41]	2011	Using a sampling-based approach, and resource allocation model to design a deadline scheduler, for real-time jobs Using two-level scheduler to schedule mixed real-time and non-real-time jobs	Increasing system utilization Assigning minimum number of resources to real-time jobs Providing deadline guarantee for the real-time jobs	No discussing the hardware heterogeneity Using a static manner to divide the job deadline into task deadlines	Polo’s model, Kc’s model	Implementation: on a eleven-node cluster configured with the hadoop-0.20.2 release
Verma et al. [42]	2011 b	Using scaling technique and performance models for estimate the resources required while meet SLOs	Estimating required resources to meet job deadlines Increasing system utilization	Not considering heterogeneity	No comparison, different input datasets in different possible scenarios	Implementation: on a Hadoop cluster

Back to article page