MapReduce scheduling algorithms in Hadoop: a systematic study

Hedayati, Soudabeh; Maleki, Neda; Olsson, Tobias; Ahlgren, Fredrik; Seyednezhad, Mahdi; Berahmand, Kamal

doi:10.1186/s13677-023-00520-9

Journal of Cloud Computing

Advances, Systems and Applications

Table 4 Data locality-aware schedulers and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference	Year	Key Ideas	Advantages	Disadvantages	Comparison Algorithms	Evaluation Techniques
Kalia et al. [50]	2022	Improving MapReduce heterogeneous performance using KNN fair share scheduling	Improving the data locality. Reducing the execution time. Improving the throughput	No pipelining between Map and reduce phases	Default Hadoop scheduler	Implementation: Hadoop cluster with 20 nodes
Li et al. [51]	2022	Performance optimization of computing task scheduling based on the Hadoop big data platform	Improving the data locality. Reducing the execution time. Improving the performance	Not considering node failures	Default Hadoop scheduler	Implementation: Hadoop cluster with six nodes
Fu et al. [52]	2020	OptLTS: optimal locality-aware task scheduling algorithm based on bipartite graph modelling for Spark applications	Improving the data locality Decreasing job execution time Reducing the network traffic and access latency	Not considering heterogeneity	DefMapTS, MaxNLT, DefRedTS, MinCRT	Implementation: cluster with 8 nodes
Gandomi et al. [53]	2019	HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework	Increasing data locality Decreasing job completion time Avoiding wasting resources	Not considering heterogeneity	Fair, FIFO	Implementation: Hadoop cluster with 21 nodes
He et al. [54]	2011	each node is given an equal chance of getting its local tasks before any non-local tasks are allocated to any node	Achieving high data locality Reducing response time for Map tasks High cluster utilization	No particular	FIFO, Delay	Implementation: Hadoop cluster (private cluster) with 31 nodes
Ibrahim et al. [55]	2012	Maestro: replica-aware Map scheduling for MapReduce	Improving locality of Map tasks Reducing execution time of Map tasks Reducing speculative Map tasks	Lacks the ability to work in a dynamic environment on the public cloud	Native Hadoop	Implementation: on a local virtualized testbed and Grid’5000 test bed (100 nodes)
Zhang et al. [56]	2011 a	Using a data locality-aware scheduling method	Improving data locality Improving job’s execution time Reducing job’s response time	Task reservation incur on runtime overhead, e.g., waiting time Not considering homogeneous environment	Default Hadoop	Implementation: in Hadoop-0.20.2
Zhang et al. [57]	2011 b	Using the next-k-node scheduling (NKS) method to calculate the probability of each Map task, and schedule the one with the highest probability	Improving data locality of Map tasks Reducing execution time Reducing network load	Supposing all nodes process Map tasks at the same rate in homogeneous environment In heterogeneous environments, such an assumption cannot be made	Default Hadoop	Implementation: on hadoop-0.20.2. in a cluster including ten nodes
Zaharia et al. [58]	2010	Delay: an effective data locality-aware task scheduling method for MapReduce framework in heterogeneous environments	Achieving high data locality Simplicity of scheduling Improving response times for small jobs Increasing throughput Avoiding job starvation	Ineffective if many tasks are longer than average jobs or if nodes have few slots Allowing a node to obtain multiple non-local Map tasks in a heartbeat interval if the node has more than one free slot	FIFO, Fair, Fair with delay scheduling	Implementation: in two environments: Amazon EC2 and a 100-node private cluster

Back to article page