MapReduce scheduling algorithms in Hadoop: a systematic study

Hedayati, Soudabeh; Maleki, Neda; Olsson, Tobias; Ahlgren, Fredrik; Seyednezhad, Mahdi; Berahmand, Kamal

doi:10.1186/s13677-023-00520-9

Journal of Cloud Computing

Advances, Systems and Applications

Table 3 Deadline-aware schedulers in heterogeneous clusters and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference	Year	Key Ideas	Advantages	Disadvantages	Comparison Algorithms	Evaluation Techniques
Jabbari et al. [43]	2021	A cost-efficient resource provisioning and scheduling approach for deadline-sensitive MapReduce computations in cloud environment	Guaranteeing the deadline. Reducing the total hiring cost	Not considering data locality. Not considering data distribution	No comparison, different input datasets in different possible scenarios	Simulation
Shao et al. [44]	2018	EFS: efficient jobs scheduling approach for big data applications	Meeting the job deadline	Not considering data distribution and replication dependencies	AlwaysOn, OPT, AutoScale	Simulation: using Scheduler Load Simulator (SLS)
Lin et al. [45]	2019	DGIA: deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems	Meeting the job deadline Considering the performance influence over existing tasks	Not considering data locality	O-Hadoop, OR-Hadoop, BGMRS, SDHP, EDFLWF	Simulation: 1000 nodes
Chen et al. [46]	2015	DCMRS: Using the bipartite graph modelling to obtain the optimal scheduling solution	Reducing job execution time Dynamically adjust the task deadlines The low proportion of jobs miss their deadline	No improving data locality No ensuring the deadline of all jobs was met High computational time	ORP, AUMD, ADAPT, MDF, MLF	Implementation: Hadoop Cluster with 24 nodes (4 PMs on Hadoop cluster, 20 VMs on Cloud) Simulation: on the MatLab
Tang et al. [47]	2013	MTSD: Using a node classification algorithm to classify the nodes according to processing capacity	Meeting the deadline constraints Increasing map task’s data locality Reducing completion time Improving the precision of task’s remaining time evaluation	No solving the reduce task scheduling problem No providing deadline guarantees for the jobs. Using a static manner to divide the job deadline into task deadlines	Fair, FIFO	Implementation: in version of Hadoop-0.21
Verma et al. [48]	2011 a	ARIA: Using the job profile to estimate the job completion time and the number of resources required for job completion within the deadline	Automatically allocate the resources to the job for meeting the deadline Increasing resource utilization Considering heterogeneous environments	Not considering Map deadline and reduce deadline Not considering node failures	No comparison, different input datasets in different possible scenarios	Implementation: on Hadoop cluster using Hadoop 0.20.2 Simulation: a discrete event simulator
Polo et al. [49]	2013	Estimating the completion time of jobs, based on the average task length of the completed tasks	Dynamically allocate resources to the jobs for meeting their deadline Increasing system throughput Maximizing data locality Considering hardware heterogeneity	Does not interrupt tasks that are already executing Not considering the differences between the Map and reduce tasks	Fair, Basic Adaptive Scheduler	Implementation: on Hadoop 0.21

Back to article page