Advances, Systems and Applications
From: MapReduce scheduling algorithms in Hadoop: a systematic study
Reference | Year | Key Ideas | Advantages | Disadvantages | Comparison Algorithms | Evaluation Techniques |
---|---|---|---|---|---|---|
Aarthee et al. [67] | 2023 | Heuristic scheduling using bin packing MapReduce scheduler for heterogeneous workloads performance in big data | Increasing the resource utilization. Improving the makespan. The number of data nonlocal execution is low | Not considering data distribution. Not considering node failures | RWS, HMJS | Implementation: Hadoop cluster with twelve nodes |
Jeyaraj et al. [68] | 2022 | Optimizing MapReduce Task Scheduling on Virtualized Heterogeneous Environments Using Ant Colony Optimization | Improving resource utilization. Minimizing the makespan | No implementation | FS, RWS, HMJS | Simulation |
Zhang et al. [69] | 2015 | PRISM: a phase-level resource-aware MapReduce scheduler | Improving resource utilization Reducing job running time Avoiding resource contention | Worse data locality Not considering heterogeneity | Fair, Yarn | Implementation: in Hadoop 0.20.2 |
Rasooli et al. [70] | 2014 | Using an algorithm which classifies the jobs based on their requirements and finds an appropriate matching of resources and jobs in the system | Reducing average completion time Improving fairness Satisfying the required minimum shares Increasing locality No starvation for small job Avoiding the sticky slot problem | Increasing scheduling overhead No separating data intensive and computation intensive jobs in performing the classification | FIFO, Fair | Implementation: Hadoop cluster with four nodes Simulation: MRSIM, a MapReduce simulator |
Polo et al. [71] | 2011 | Using job profiling information to dynamically adjust the number of job slots and their placement | Improving resource utilization Meeting completion time goals Reducing makespan | Not considering heterogeneity The task’s resource consumption is assumed to be stable during its lifetime No support job priority | Fair | Implementation: on a Hadoop cluster |
Sharma et al. [72] | 2012 | MROrchestrator: Using fine-grained, dynamic, and coordinated resource allocation instead of slot-based resource allocation | Increasing resource utilization Reducing job completion time Dynamically identify resource bottlenecks, and resolve them | Not considering other resources like disk and network bandwidth No evaluating on a large cloud Neglecting the workload imbalance among tasks Not considering heterogeneity | Mesos, NGM | Implementation: on two 24-node physical and virtualized Hadoop clusters |
Pastorelli et al. [73] | 2015 | Using size-based scheduling with aging | Decreasing system response times Guarantying Fairness Avoiding job starvation Considering data locality | No implementation in large clusters | Fair | Implementation: on a cluster composed of 20 Task Tracker worker machines |
Tian et al. [74] | 2011 | Using cost model to find the optimal resource provisioning for different decision problems | Cost model fits well on four tested programs Cost model has low error rates | Not conducting experiments in the public cloud Not considering heterogeneity | No comparison, different input datasets in different possible scenarios | Implementation: in their in-house 16-node Hadoop cluster |
Ghoneem et al. [75] | 2017 | Providing the classifier with information about jobs requirement and node capabilities | Increasing resources utilization Minimizing average completion time Minimizing master node overhead No starvation for small job Avoiding the sticky slot problem Considering heterogeneity | Finding content information is computationally expensive and time consuming | HP model, Starfish | Implementation: on a cluster consisted of three nodes |