MapReduce scheduling algorithms in Hadoop: a systematic study

Hedayati, Soudabeh; Maleki, Neda; Olsson, Tobias; Ahlgren, Fredrik; Seyednezhad, Mahdi; Berahmand, Kamal

doi:10.1186/s13677-023-00520-9

Journal of Cloud Computing

Advances, Systems and Applications

Table 6 Resource-aware schedulers and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference	Year	Key Ideas	Advantages	Disadvantages	Comparison Algorithms	Evaluation Techniques
Aarthee et al. [67]	2023	Heuristic scheduling using bin packing MapReduce scheduler for heterogeneous workloads performance in big data	Increasing the resource utilization. Improving the makespan. The number of data nonlocal execution is low	Not considering data distribution. Not considering node failures	RWS, HMJS	Implementation: Hadoop cluster with twelve nodes
Jeyaraj et al. [68]	2022	Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization	Improving resource utilization. Minimizing the makespan	No implementation	FS, RWS, HMJS	Simulation
Zhang et al. [69]	2015	PRISM: a phase-level resource-aware MapReduce scheduler	Improving resource utilization Reducing job running time Avoiding resource contention	Worse data locality Not considering heterogeneity	Fair, Yarn	Implementation: in Hadoop 0.20.2
Rasooli et al. [70]	2014	Using an algorithm which classifies the jobs based on their requirements and finds an appropriate matching of resources and jobs in the system	Reducing average completion time Improving fairness Satisfying the required minimum shares Increasing locality No starvation for small job Avoiding the sticky slot problem	Increasing scheduling overhead No separating data intensive and computation intensive jobs in performing the classification	FIFO, Fair	Implementation: Hadoop cluster with four nodes Simulation: MRSIM, a MapReduce simulator
Polo et al. [71]	2011	Using job profiling information to dynamically adjust the number of job slots and their placement	Improving resource utilization Meeting completion time goals Reducing makespan	Not considering heterogeneity The task’s resource consumption is assumed to be stable during its lifetime No support job priority	Fair	Implementation: on a Hadoop cluster
Sharma et al. [72]	2012	MROrchestrator: Using fine-grained, dynamic, and coordinated resource allocation instead of slot-based resource allocation	Increasing resource utilization Reducing job completion time Dynamically identify resource bottlenecks, and resolve them	Not considering other resources like disk and network bandwidth No evaluating on a large cloud Neglecting the workload imbalance among tasks Not considering heterogeneity	Mesos, NGM	Implementation: on two 24-node physical and virtualized Hadoop clusters
Pastorelli et al. [73]	2015	Using size-based scheduling with aging	Decreasing system response times Guarantying Fairness Avoiding job starvation Considering data locality	No implementation in large clusters	Fair	Implementation: on a cluster composed of 20 Task Tracker worker machines
Tian et al. [74]	2011	Using cost model to find the optimal resource provisioning for different decision problems	Cost model fits well on four tested programs Cost model has low error rates	Not conducting experiments in the public cloud Not considering heterogeneity	No comparison, different input datasets in different possible scenarios	Implementation: in their in-house 16-node Hadoop cluster
Ghoneem et al. [75]	2017	Providing the classifier with information about jobs requirement and node capabilities	Increasing resources utilization Minimizing average completion time Minimizing master node overhead No starvation for small job Avoiding the sticky slot problem Considering heterogeneity	Finding content information is computationally expensive and time consuming	HP model, Starfish	Implementation: on a cluster consisted of three nodes

Back to article page