Advances, Systems and Applications
From: MapReduce scheduling algorithms in Hadoop: a systematic study
Reference | Year | Key Ideas | Advantages | Disadvantages | Comparison Algorithms | Evaluation Techniques |
---|---|---|---|---|---|---|
Kalia et al. [50] | 2022 | Improving MapReduce heterogeneous performance using KNN fair share scheduling | Improving the data locality. Reducing the execution time. Improving the throughput | No pipelining between Map and reduce phases | Default Hadoop scheduler | Implementation: Hadoop cluster with 20 nodes |
Li et al. [51] | 2022 | Performance optimization of computing task scheduling based on the Hadoop big data platform | Improving the data locality. Reducing the execution time. Improving the performance | Not considering node failures | Default Hadoop scheduler | Implementation: Hadoop cluster with six nodes |
Fu et al. [52] | 2020 | OptLTS: optimal locality-aware task scheduling algorithm based on bipartite graph modelling for Spark applications | Improving the data locality Decreasing job execution time Reducing the network traffic and access latency | Not considering heterogeneity | DefMapTS, MaxNLT, DefRedTS, MinCRT | Implementation: cluster with 8 nodes |
Gandomi et al. [53] | 2019 | HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework | Increasing data locality Decreasing job completion time Avoiding wasting resources | Not considering heterogeneity | Fair, FIFO | Implementation: Hadoop cluster with 21 nodes |
He et al. [54] | 2011 | each node is given an equal chance of getting its local tasks before any non-local tasks are allocated to any node | Achieving high data locality Reducing response time for Map tasks High cluster utilization | No particular | FIFO, Delay | Implementation: Hadoop cluster (private cluster) with 31 nodes |
Ibrahim et al. [55] | 2012 | Maestro: replica-aware Map scheduling for MapReduce | Improving locality of Map tasks Reducing execution time of Map tasks Reducing speculative Map tasks | Lacks the ability to work in a dynamic environment on the public cloud | Native Hadoop | Implementation: on a local virtualized testbed and Grid’5000 test bed (100 nodes) |
Zhang et al. [56] | 2011 a | Using a data locality-aware scheduling method | Improving data locality Improving job’s execution time Reducing job’s response time | Task reservation incur on runtime overhead, e.g., waiting time Not considering homogeneous environment | Default Hadoop | Implementation: in Hadoop-0.20.2 |
Zhang et al. [57] | 2011 b | Using the next-k-node scheduling (NKS) method to calculate the probability of each Map task, and schedule the one with the highest probability | Improving data locality of Map tasks Reducing execution time Reducing network load | Supposing all nodes process Map tasks at the same rate in homogeneous environment In heterogeneous environments, such an assumption cannot be made | Default Hadoop | Implementation: on hadoop-0.20.2. in a cluster including ten nodes |
Zaharia et al. [58] | 2010 | Delay: an effective data locality-aware task scheduling method for MapReduce framework in heterogeneous environments | Achieving high data locality Simplicity of scheduling Improving response times for small jobs Increasing throughput Avoiding job starvation | Ineffective if many tasks are longer than average jobs or if nodes have few slots Allowing a node to obtain multiple non-local Map tasks in a heartbeat interval if the node has more than one free slot | FIFO, Fair, Fair with delay scheduling | Implementation: in two environments: Amazon EC2 and a 100-node private cluster |