Advances, Systems and Applications
From: MapReduce scheduling algorithms in Hadoop: a systematic study
Reference | Year | Key Ideas | Advantages | Disadvantages | Comparison Algorithms | Evaluation Techniques |
---|---|---|---|---|---|---|
Gao et al. [33] | 2022 | Deadline aware preemptive job scheduling in Hadoop Yarn clusters | Minimizing the job deadline misses. Improving the resource utilization | Not considering heterogeneous clusters. No ensuring the deadline of all jobs was met | Capacity Scheduler, EDF, PDSonQueue (PDQ) | Implementation: Hadoop YARN cluster with five nodes |
Cheng et al. [34] | 2018 | RDS: deadline-aware MapReduce job scheduling with dynamic resource availability | Reducing job deadline misses Reducing job completion time Predicting future resource availability Using flexible deadline time | Not considering heterogeneity Not considering data locality | Fair, Earliest Deadline First (EDF) | Implementation: on a Hadoop cluster |
Kao et al. [35] | 2016 | DamRT: Using dispatcher and schedulability test for deadline guarantees | Increasing the number of jobs that meet the deadlines Increasing data locality Minimizing response time of tasks | Using a static manner to divide the job deadline into task deadlines Not considering heterogeneity | FIFO, EDF, MTSD | Simulation: CloudSimRT simulator |
Verma et al. [36] | 2012 | Using EDF policy for job ordering in the processing queue Using MinEDF and MinEDF-WC mechanisms for allocating a tailored number of Map and reduce slots to each job | Increasing resource utilization Reducing job completion time Minimizing the number of deadline-over jobs | Not considering the heterogeneity | EDF, MinEDF, MinEDF-WC | Implementation: Hadoop cluster with 66 nodes Simulation: SimMR simulation environment |
Phan et al. [37] | 2011 | Using EDF/MR policy to minimize miss rate Using EDF/TD policy to minimize total tardiness | Minimizing deadline miss ratio and tardiness Considering data locality | Not considering the heterogeneity | FIFO, Fair, Capacity | Implementation: Hadoop cluster (21 nodes) on Amazon EC2 |
Kc et al. [38] | 2014 | Using a schedulability test to determine whether the job can be completed within the specified deadline or not | Maximizing the number of jobs that can be run while satisfying the deadlines Increasing resource utilization | Nodes are homogeneous data distribution is uniform | No comparison, different input datasets in different possible scenarios | Implementation: in virtualized cluster and physical cluster |
Teng et al. [39] | 2014 | Pausing between Map and reduce stages Using a schedulability bound to test the schedulability of real-time tasks | Meeting the job deadlines Increasing cluster utilization Providing deadline guarantees for the jobs | Not considering heterogeneity Not considering dependent tasks, periodic tasks, non-preemptive execution | FIFO, RM | Implementation: on the Hadoop testbed Implementation: on Hadoop 1.1.0 Simulation: SimMapReduce simulation environment |
Wang et al. [40] | 2015 | SAMES: Using a scheduling algorithm based on the most effective sequence (SAMES) for deadline-constraint jobs | Completing the jobs before their deadlines Increasing system utilization | Not considering heterogeneity Not considering the differences between the Map and Reduce tasks | DC, MinEDF-WC | Implementation: on Hadoop (1.0.4) |
Dong et al. [41] | 2011 | Using a sampling-based approach, and resource allocation model to design a deadline scheduler, for real-time jobs Using two-level scheduler to schedule mixed real-time and non-real-time jobs | Increasing system utilization Assigning minimum number of resources to real-time jobs Providing deadline guarantee for the real-time jobs | No discussing the hardware heterogeneity Using a static manner to divide the job deadline into task deadlines | Polo’s model, Kc’s model | Implementation: on a eleven-node cluster configured with the hadoop-0.20.2 release |
Verma et al. [42] | 2011 b | Using scaling technique and performance models for estimate the resources required while meet SLOs | Estimating required resources to meet job deadlines Increasing system utilization | Not considering heterogeneity | No comparison, different input datasets in different possible scenarios | Implementation: on a Hadoop cluster |