Skip to main content

Advances, Systems and Applications

Table 4 Data locality-aware schedulers and their properties

From: MapReduce scheduling algorithms in Hadoop: a systematic study

Reference

Year

Key Ideas

Advantages

Disadvantages

Comparison Algorithms

Evaluation Techniques

Kalia et al. [50]

2022

Improving MapReduce heterogeneous performance using KNN fair share scheduling

Improving the data locality. Reducing the execution time. Improving the throughput

No pipelining between Map and reduce phases

Default Hadoop scheduler

Implementation:

Hadoop cluster with 20 nodes

Li et al. [51]

2022

Performance optimization of computing task scheduling based on the Hadoop big data platform

Improving the data locality. Reducing the execution time. Improving the performance

Not considering node failures

Default Hadoop scheduler

Implementation:

Hadoop cluster with six nodes

Fu et al. [52]

2020

OptLTS: optimal locality-aware task scheduling algorithm based on bipartite graph modelling for Spark applications

Improving the data locality

Decreasing job execution time

Reducing the network traffic and access latency

Not considering heterogeneity

DefMapTS, MaxNLT, DefRedTS, MinCRT

Implementation:

cluster with 8 nodes

Gandomi et al. [53]

2019

HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework

Increasing data locality

Decreasing job completion time

Avoiding wasting resources

Not considering heterogeneity

Fair, FIFO

Implementation: Hadoop cluster with 21 nodes

He et al. [54]

2011

each node is given an equal chance of getting its local tasks before any non-local tasks are allocated to any node

Achieving high data locality

Reducing response time for Map tasks

High cluster utilization

No particular

FIFO, Delay

Implementation:

Hadoop cluster (private cluster) with 31 nodes

Ibrahim et al. [55]

2012

Maestro: replica-aware Map scheduling for MapReduce

Improving locality of Map tasks

Reducing execution time of Map tasks

Reducing speculative Map tasks

Lacks the ability to work in a dynamic environment on the public cloud

Native Hadoop

Implementation: on a local virtualized testbed and Grid’5000 test bed (100 nodes)

Zhang et al. [56]

2011 a

Using a data locality-aware scheduling method

Improving data locality

Improving job’s execution time

Reducing job’s response time

Task reservation incur on runtime overhead, e.g., waiting time

Not considering homogeneous environment

Default Hadoop

Implementation:

in Hadoop-0.20.2

Zhang et al. [57]

2011 b

Using the next-k-node scheduling (NKS) method to calculate the probability of each Map task, and schedule the one with the highest probability

Improving data locality of Map tasks

Reducing execution time

Reducing network load

Supposing all nodes process Map tasks at the same rate in homogeneous environment

In heterogeneous environments, such an assumption cannot be made

Default Hadoop

Implementation:

on hadoop-0.20.2. in a cluster including ten nodes

Zaharia et al. [58]

2010

Delay: an effective data locality-aware task scheduling method for MapReduce framework in heterogeneous environments

Achieving high data locality

Simplicity of scheduling

Improving response times for small jobs

Increasing throughput

Avoiding job starvation

Ineffective if many tasks are longer than average jobs or if nodes have few slots

Allowing a node to obtain multiple non-local Map tasks in a heartbeat interval if the node has more than one free slot

FIFO, Fair, Fair with delay scheduling

Implementation: in two environments: Amazon EC2 and a 100-node private cluster