EDQWS: an enhanced divide and conquer algorithm for workflow scheduling in cloud

A workflow is an effective way for modeling complex applications and serves as a means for scientists and researchers to better understand the details of applications. Cloud computing enables the running of workflow applications on many types of computational resources which become available on-demand. As one of the most important aspects of cloud computing, workflow scheduling needs to be performed efficiently to optimize resources. Due to the existence of various resource types at different prices, workflow scheduling has evolved into an even more challenging problem on cloud computing. The present paper proposes a workflow scheduling algorithm in the cloud to minimize the execution cost of the deadline-constrained workflow. The proposed method, EDQWS, extends the current authors’ previous study (DQWS) and is a two-step scheduler based on divide and conquer. In the first step, the workflow is divided into sub-workflows by defining, scheduling, and removing a critical path from the workflow, similar to DQWS. The process continues until only chain-structured sub-workflows, called linear graphs, remain. In the second step which is linear graph scheduling, a new merging algorithm is proposed that combines the resulting linear graphs so as to reduce the number of used instances and minimize the overall execution cost. In addition, the current work introduces a scoring function to select the most efficient instances for scheduling the linear graphs. Experiments show that EDQWS outperforms its competitors, both in terms of minimizing the monetary costs of executing scheduled workflows and meeting user-defined deadlines. Furthermore, in more than 50% of the examined workflow samples, EDQWS succeeds in reducing the number of resource instances compared to the previously introduced DQWS method.


Introduction
Nowadays, cloud computing offers an opportunity to execute scientific applications composed of hundreds or thousands of interrelated tasks [1]. In this context, the workflow model is an effective way to construct such complex applications. It consists of application tasks specified by nodes and the connection lines among nodes which create dependencies among these tasks in a directed acyclic graph (DAG) [2]. The workflow scheduling problem in the cloud aims to assign the tasks to computing resources in order to preserve task precedence while meeting some performance criteria [3].
Besides the total execution time of workflows, most research on workflow scheduling in the cloud has focused on optimizing the total usage cost of computing resources offered by cloud providers [2]. Moreover, faster and more powerful computing resources in the cloud are usually more expensive than slower ones. Therefore, the execution cost can be affected by employing powerful computing resources as this decreases the workflow execution time. Thus, the trade-off between cost and time is a major challenge of workflow scheduling in the cloud [4]. To address this trade-off, two common methods are employed: either minimizing the total execution time under a budget constraint or minimizing the monetary cost under a deadline constraint.
Depending on user requirements and the workflow application, both cost and time can be considered as constraints or optimization objectives. Some workflows are structured in such a way that the results can be used whenever the execution of the workflow is completed before the deadline. The workflows of medical simulations and weather forecasting are examples of deadline-constrained applications [4]. To address such deadline-constrained applications, the current study proposes a new workflow scheduling algorithm focused on minimizing the total execution cost while respecting the user-defined deadline.
The proposed algorithm, named enhanced divide and conquer workflow scheduling (EDQWS), is an extension of the present authors' earlier study [5] which introduced a type of a divide and conquer-based approach. Similar to this previous work, in EDQWS, a large workflow is broken into a number of sub-workflows by identifying the critical path of a workflow and then scheduling and removing it. The same series of steps are repeated in all sub-workflows. Creating sub-workflow with a chain structure, called linear graph, is the stop condition of the algorithm. The present authors extend their previous method of scheduling linear graphs. Since the scheduling of linear graphs as solvable problems is performed in the final step, the linear graphs resulting from different stages of the division process are independent of each other and can therefore be separately scheduled. However, the specific order or combination of their scheduling can affect the selected resource types and the total execution cost. To this end, the current study proposes a scoring function for determining the linear graph combination score on different resources. At each stage, the combination with the highest score is selected and the new combined graph is replaced by the original ones. This combination is binary and will continue until the scoring function reaches a negative score for all new combinations. Owing to the large number of linear graphs, the score calculation is performed in parallel. The present study also introduces a new merge list algorithm that combines the tasks of linear graphs with respect to the laxity of the tasks. To evaluate the performance of the proposed method, it is compared with several state-of-the-art scheduling algorithms. The experimental results determine that the presented method outperforms others in total execution cost and the success rate of meeting deadlines. Moreover, the introduced approach can reduce the number of instances required for scheduling when compared to the previous work of the current authors. The rest of the present article is organized as follows: Section 2 reviews related workflow scheduling. Section 3 formulates the scheduling model and describes its workflow application and resource model. Details of the proposed algorithm are provided in Section 4. Section 5 discusses the experimental results. Finally, the conclusion and future works are presented in Section 6.

Related work
In recent decades, workflow scheduling has been extensively investigated by academia and industrial researchers. For a traditional distributed system, such as a grid and cluster, most existing research in workflow scheduling focuses on how to minimize the workflow execution time. However, workflow scheduling in the cloud environment is mainly a multi-objective problem. Thus, aside from the execution time, various criteria, such as monetary cost, energy usage, reliability, and security, are considered as QoS requirements [6][7][8][9][10]. Among these, monetary cost and execution time are substantial requirements for workflow scheduling algorithms [2,[11][12][13][14][15]. Normally, for multi-objective cases in which some objectives must be optimal, it is difficult to solve the workflow scheduling problem in the cloud due to its NP-completeness [16]. Therefore, various meta-heuristic and heuristic techniques have been adopted to obtain near-optimal solutions. This section briefly reviews several well-known heuristics and meta-heuristic workflow scheduling algorithms related to the present study's proposed method. As the main category, deadline-aware workflow scheduling algorithms are first reviewed. This is followed by a discussion on budget-aware and then multi-objective schedulers.

Deadline-aware workflow scheduling
Malawski et al. [17] present a mathematical model to optimize the workflow scheduling cost under a deadline constraint. Their method considers a multi-cloud environment and formulates the scheduling problem as a mixed-integer program (MIP). Abrishami et al. [18] utilize the Partial Critical Path (PCP) concept to develop a deadline-constrained workflow scheduler, named IC-PCP, in a cloud environment. IC-PCP [18] aims to minimize the overall execution cost of the workflow by determining a sequence of tasks as the partial critical paths (PCPs) and mapping all of these tasks to the same VM instance. The preference of the algorithm is to utilize the already leased instances which are able to meet the deadline. IC-PCP distributes the overall deadline to the PCPs. The Enhanced IC-PCP with Replication (EIPR) algorithm [19] attempts to further reduce costs by replicating tasks during the idle times of instances and eliminating some communications. Its experimental results show that the probability of meeting deadlines increases via task replications. The Deadline Constrained Critical Path (DCCP) [20] is a list-based scheduling algorithm on the cloud that aims to meet the user-defined deadline while minimizing the overall workflow execution cost. In the preprocessing step, tasks are partitioned into different levels, to each of which a sub-deadline is assigned. These deadlines are distributed non-uniformly among all levels so that the levels with a longer task execution time receive a longer sub-deadline. In the task prioritization step, DCCP utilizes a concept called the Constrained Critical Path (CCP) to assign all tasks on a path to one resource in order to reduce the communication time of the whole workflow. DCCP finds all CCPs and creates a list based on their modified rank. In each step, the ready tasks of each CCP are mapped to an appropriate resource and other tasks remain for the next steps. Rodriguez and Buyya [21] propose a metaheuristic scheduler in the cloud that intends to minimize the execution cost for deadline-constrained workflows. In this algorithm, resource provisioning and task assignment are integrated as a particle swarm optimization problem. The algorithm produces a near-optimal schedule that determines the number and types of VMs with their leasing period and task assignment. Guo et al. [1] also introduce a PSO 1 -based algorithm for scheduling a deadline-constrained workflow across multiple clouds. Their algorithm minimizes the execution cost of the workflow while meeting the user-defined deadline. Furthermore, the algorithm optimizes the performance for both computation cost and data transfer cost across multiple clouds. Proportional Deadline Constrained (PDC) [12] is a workflow scheduling algorithm on the cloud that attempts to meet the user-defined deadline while minimizing the execution cost. In the preprocessing step, PDC partitions tasks into different levels and assigns a sub-deadline to each level. The user-defined deadline is distributed non-uniformly among all levels so that levels with a longer task execution time receive a longer sub-deadline. PDC creates a list of ready tasks and prioritizes them according to a downward rank. In [2], two schedulers, namely L-ACO 2 and ProLiS, 3 are presented to schedule the deadline-constrained workflow application. ProLiS is a list scheduling algorithm that performs deadline distribution based on the new definition of the probabilistic upward rank. L-ACO is a meta-heuristic algorithm that accomplishes the cost optimization of a deadline-constrained workflow based on ant colony optimization. Deadline distribution and service selection in L-ACO is the same as in ProLiS.

Budget-aware workflow scheduling
A budget indicates the maximum amount of money that users are willing to pay for the execution of a workflow application in cloud resources [3]. In [22], the authors propose a Heterogeneous Budget Constrained Scheduling (HBCS) algorithm that minimizes the total workflow while meeting the user's specified budget. The HBCS defines an attribute called worthiness which combines the time and cost factors for the current task resource selection. Faragardi et al. [4] introduce Greedy Resource Provisioning and a modified HEFT (GRP-HEFT) for minimizing the workflow execution time subject to a budget constraint. They propose a greedy algorithm to list the instance types according to their efficiency rate and modify the HEFT [23] algorithm so that it considers a budget constraint. In [24], Wu et al. present a heuristic algorithm, called PCP-B, to schedule a workflow with a budget constraint. PCP-B implements the idea of balancing a budget among the partial critical paths according to their parallel or sequential structural nature. Budget distribution is performed based on the binary search method.

Multi-objective workflow scheduling
In this category, most strategies try to find a suitable mapping of workflow tasks to cloud resources that respects deadline and budget constraints at the same time. Budget and Deadline Constraint Heterogeneous Earliest Finish Time (BDHEFT) [15] is a multi-objective algorithm proposed to schedule workflow applications on a cloud. BDHEFT leverages the upward ranks to assign a priority to each task. In addition, a set of best possible resources is constructed for each selected task via the following six variables: Spare Workflow Budget (SWB), Spare Workflow Deadline (SWD), Current Task Deadline (CTD), Current Task Budget (CTB), Adjustment Factor (BAF), and Deadline Adjustment Factor (DAF). By considering the spare deadline and spare budget of each task, the resource is selected from the best possible resource set for each task, in which the overall execution time and execution cost of the workflow execution are simultaneously minimized. Durillo and Prodan propose the multiobjective heterogeneous earliest finish time (MOHEFT) algorithm [25] as an extension of HEFT [23]. MOHEFT computes a set of Pareto-based solutions from which users can select the best one. As noted by the authors, most of the solutions computing the Pareto-front are based on genetic algorithms. The algorithm is generic in terms of the number and types of objectives and so the 1 Particle swarm optimization (PSO) 2 List scheduling ant colony optimization (L-ACO) 3 Deadline-constrained probabilistic list scheduling (ProLiS) makespan and overall cost of the workflow applications can be optimized. Wu et al. [26] present a PSO-based strategy for workflow scheduling in the clouds. Their aim is to reduce either the makespan or cost while satisfying either the budget or deadline constraints. The elasticity of resource provision is ignored and it is assumed that several initialized VMs are available in advance. In [27], a heuristic-based scheduling algorithm is proposed to schedule the workflow under deadline and budget constraints. The algorithm utilizes a novel trade-off factor between time and cost to determine the most viable scheduling and the most appropriate instance type for provisioning.

Problem statement
The present study addresses the problem of the cost optimization of deadline constrained workflow scheduling in the cloud. This section first explains the workflow model, resource model, and definitions related to this problem. Subsequently, the problem formulation is presented.

Workflow model
A workflow application can be modeled as a directed acyclic graph (DAG), is a set of all workflow tasks illustrated by graph vertices represents the dependencies among the tasks. Each e i, j indicates the precedence between t i and t j , meaning that t j can be performed when t i is completed. Besides the dependencies, the data transmission among tasks is represented by the weight attached to e i, j . Furthermore, data i, j shows the amount of data transferred to t j after t i is completed; hence, the execution of t j can only start after data i, j has already been made available. On the other hand, a task can be executed if all its predecessors are terminated. Note that, on each edge, e i, j , t i is a predecessor of t j and t j is a successor of t i . Each task may have one or more predecessors and successors except for t entry and t exit . t entry is a task with no predecessor and t exit is a task with no successor. To generalize the workflow with one entry and one exit, two dummy tasks, t entry and t exit , with zero execution time and without data transmission, are added to the beginning and the end of the workflow, respectively. Fig. 1 illustrates a sample workflow represented by a DAG.

Resource model
The present article considers IaaS as a cloud service provider. IaaS provides a variety of computational resources with different costs via virtual machines (VMs) that feature different processing capabilities, memory, and storage. VMs with higher processing capabilities are assumed to have higher costs. A running virtual machine is called an instance and users can request infinite instances from the cloud service provider. In the current study, instances are provisioned on-demand and the pricing model is considered to be pay-as-you-go hourlybased, which is widely used by large public cloud providers [4]. In this model, a user must pay for the whole hour even though use of an instance is for less than an hour. The current work denotes VM = {vm 1 , vm 2 , …, vm n } as a set of heterogeneous computational resources offered by the cloud service provider via VM and Cost = {C 1 , C 2 , …, C n } as the cost of using each VM for an hour. It is also assumed that all computational resources are in the same region. Thus, the average bandwidth between instances is roughly identical and internal data transfer is free of charge [2].

Task execution time
Given that the VMs offered by the service provider are heterogeneous, each task has a different execution time based on the type of VM running on it. Therefore, the execution time of task t i on vm j is denoted by w i, j . It should be noted that w i, j is the worst-case execution time of t i on vm j and it is assumed that only one task can be executed on each VM instance at any time.

Communication time
The communication time between two tasks is the amount of time necessary to transfer data from t i to t j , as shown in (1). When both tasks are executed on the same instance, the communication time becomes zero.
where Inst(t i ) represents the instance on which t i is mapped, data i, j shows the amount of data transferred to t j after t i 's completion, and BW is the bandwidth between the two instances.

Earliest start time (EST) and earliest finish time (EFT)
For each unscheduled task, t i , EST(t i ) is defined as the earliest time when t i can start its execution after all its predecessors have finished and after having received the associated data. EST is calculated as follows: where CM j, i is the communication time and W(t j ) is the shortest execution time of t j , which is defined as follows: According to the earliest start time definition, the earliest finish time of each unscheduled task, t i , can be calculated as follows:

Actual start time (AST) and actual finish time (AFT)
After assigning a task to the desired instance, AST and AFT are obtained for each task. These values can be different from the EST and EFT that are determined before scheduling. Equation (5) shows the relation between these parameters. (1) where j is the type of instance to which t i is assigned.

Critical path and critical tasks
The critical path (CP) in a workflow is the path from the entry task to the exit task which has the maximum summation of task execution times and inter-task communication times among all paths [23]. All the tasks on a critical path are called critical tasks.

Linear graph and nonlinear graph
The present paper divides the workflow graphs into two categories: linear graphs and nonlinear graphs. A linear graph is a connected graph with a chain structure. Each vertex in this graph, which corresponds to a task in the workflow, has exactly one parent and one child, except for the entry task, which has no parent, and the exit task, which has no child. If a graph contains other structures, it is called a nonlinear graph. Figure 2 provides a graphic representation of linear and nonlinear graphs.

Problem formulation
The issue presented in the current article is the deadlineconstrained cost optimization of workflow scheduling in an IaaS cloud environment. Considering wf as the workflow and D as the workflow deadline, this subsection first defines the overall workflow execution cost and then formulates the problem. The total execution cost of a workflow, Cost total (wf), is the summation of all the costs of the used instances during scheduling, as follows: where Inst i is the i th leased instance and n is the total number of leased instances. Cost(Inst i ) depends on two parameters: the total time of utilizing Inst i and the price of Inst i for an hour. As mentioned earlier, there is an hourly-based price model and the user must pay for the whole hour even though the instance is used for less than an hour. As a result, the instance cost is calculated as follows: where C(Inst i ) is the specified cost for utilizing Inst i for 1 hour. According to the previous definitions, the present study's optimization scheduling problem is applied to achieve a mapping of tasks to suitable instances so as to minimize the total monetary cost while not exceeding the deadline constraint. Relation (8) shows our problem formulation.

Proposed method
The proposed solution for the cloud workflow scheduling problem is a static method that aims to minimize the execution cost of the workflow while meeting its deadline. For minimizing the execution cost, the introduced method focuses on the repeated use of the critical path concept. Dividing the initial workflow into several sub-workflows makes it possible to define the critical path in each resulting sub-workflow. According to the workflow structure and the dependencies among tasks, the original divide and conquer method is modified to be consistent with the goals of the current study. The proposed algorithm consists of two main phases: the division phase and the linear graph scheduling phase. In the division phase, the workflow division algorithm is presented to perform the workflow dividing operations according to the current authors' previous research (DQWS) [5]. The division of the workflow is performed by determining, scheduling, and removing the workflow critical path. By removing the critical path, the workflow leftovers are divided into one or more smaller sub-workflows and this process is repeated for each sub-workflow. The creation of sub-workflows as the structure of the linear graphs is the stop condition of the divide and conquer process and the sub-workflow scheduling is considered as a small solvable problem. By scheduling linear graphs with the scoring method, the proposed linear graph scheduling phase takes on a completely different approach from the current authors' previous work. As the difference between the proposed method and the present authors' past work (DQWS) lies in this second phase of the introduced algorithm, Section 4-1 shall first present the generalities of the proposed workflow division algorithm while Section 4-2 will go on to explain the details of the proposed linear graph scheduling algorithm.

Workflow division algorithm
The workflow division algorithm follows different steps to divide the initial workflow into sub-workflows. Similar to DQWS [5], the proposed algorithm (Fig. 3) first determines the workflow's critical path. This is achieved by the Find _ CriticalPath function shown in Fig. 4. In determining the critical path, it is important to identify the critical parent of each task. The critical parent of t i is one of its parents, such as t p , which maximizes the expression, EFT(t p ) + CM p, i . CM p, i is the communication time between t p and t i .
After determining a critical path, a set of possible resources is defined to schedule it by the Find _ Possi-bleResources function. This function searches among a variety of resource types offered by the service provider that are capable of scheduling the critical path tasks before the user-defined deadline and creates a set of these resource types as a possible resource set. The cheapest resource type from this set is then selected and the critical path tasks are pre-assigned on an instance of the selected resource. Because the critical path is eliminated from the workflow in the next steps, it is necessary to finalize the scheduling of the critical path task. The Check _ Subpaths function is responsible for finalizing critical path scheduling. To accomplish this, the Check _ Subpaths function checks all sub-paths leading to the critical path and examines the possibility that each sub-path can be scheduled by at least one type of resource. The Find _ Subpaths function determines the sub-paths leading to each critical task, as shown in Fig. 5. The successful output of the Check _ Subpaths function indicates that the critical path scheduling on the selected resource is finalized. Fig. 6 presents the details of this function. After finalization of the critical path schedule, the tasks in this path are removed from the  workflow and the workflow is divided into one or more connected graphs. Depending on their structure, each of the resulting graphs is added to one of the linear graph or nonlinear graph sets. This operation is performed for all graphs in the nonlinear graph set.  Linear graph scheduling algorithm Since the linear graphs obtained from different stages of the workflow division are independent of each other, there is no requirement to observe a specific order in their execution. However, a specific arrangement or combination in their scheduling can be effective in choosing the resource type, which, in turn, will affect the workflow execution cost. In contrast to the current authors' previous study (DQWS) [5], which employs a greedy method to schedule linear graphs, the present paper utilizes a scoring function for its linear graph scheduling. As shown in Fig. 7, the proposed algorithm consists of three different phases: the initialization phase, the internal combination phase, and the external combination phase.
The following presents the details of these different algorithm phases.

Initialization phase
The objective of the algorithm initialization phase (Lines 2-9) is to determine the candidate resources for each linear graph and distribute the linear graphs in the sets associated with their candidate resources. A candidate resource is a resource type whose linear graph execution cost is the lowest among the other resource types. Each linear graph may have more than one candidate resource. At the beginning of this phase, a dependent set , S R i , is defined for each resource type, R i , offered by the service provider and, for each linear graph, candidate resources are determined. Then, each linear graph is pre-scheduled on a new instance of each candidate resource type and then added to the sets that belong to its candidate resources.

Internal combination phase
The objective of the proposed algorithm's second phase (Lines 10-19) is to reduce the workflow execution cost, as performed by pairwise combinations of linear graphs. During the process of combination, the preference of the algorithm is lowering the cost and, as a result, reducing the number of instances required for scheduling. In this phase, in each of the defined sets, S R i , all pairwise combinations of linear graphs are determined and the score of each combination is calculated by the Score _ Combination function. As shown in (9), the combined score is calculated as the difference in execution costs when both linear graphs are scheduled on one sample and when each is scheduled on a separate sample.
Due to the large number of linear graphs in each set, determining all pairwise combinations of the linear graphs as well as calculating the score of each combination is quite time consuming. For this reason, a parallel algorithm performs this function. Subsection 4-2-2-1 provides the details of the Score _ Combination function. After the score of all combinations is calculated, the combination with the highest score and its set are determined and the combined graph is added to the selected set. Then the initial linear graphs making up the combined graph are removed from all sets. Since the sets are not disjoint, the initial linear graphs must be removed from all the sets. Determining the pairwise combinations of linear graphs and selecting the best combination is repeated until the score of all combinations in all sets is negative.
Details of the scoring function The score of each binary combination is calculated by the Score _ Combination function. Figure 8 presents the pseudo-code of this function. At first, the possibility of combining two linear (9) Score = cost linear_graph 1 + cost linear_graph 2 − Combination Cost graphs is explored. If this is not possible, the combination cost is considered infinite.
Definition: G 1 and G 2 are linear graphs and D 1 and D 2 are their deadlines, respectively. These linear graphs are uncombinable if the total execution time of their tasks is greater than both D 1 and D 2 .
To determine the execution cost of the combinable graphs, two linear graphs are first pre-scheduled on one instance. Pre-scheduling is performed in two different ways: pre-scheduling for graphs with a time overlap and pre-scheduling for graphs without a time overlap.
Definition: Linear graphs, G1 and G2, are not timeoverlapped graphs if: In this case, by pre-scheduling the tasks of both graphs on one instance, the order, start time, and finish time of the tasks does not change. To pre-schedule linear graphs with a time overlap, the present study provides a new merge list algorithm. The proposed merge list algorithm receives two lists as input. Each list includes the tasks of one linear graph that are arranged from entry task to exit task. The output of this algorithm will also be a list that specifies the final sequence of the tasks in the combined graph. The current work's contribution to the merge list algorithm is the header selection method. To select a header, the laxity of all tasks in both graphs is first determined and the following conditions are considered: a. Selecting each header creates negative laxity for one or more tasks of the other list. b. Selecting one of the headers creates negative laxity for one or more tasks of the other list. c. Neither of the choices leads to negative laxity.
In case (a), it is not possible to combine two linear graphs; hence, the algorithm terminates. In case (b), a header is selected that does not create a negative laxity. In case (c), a header is chosen with an earlier start time in its initial linear graph and, if the start time of both headers is the same, then a header with lower laxity is selected. Fig. 9 shows an illustrative example for combining two linear graphs, G 1 and G 2 , with the proposed modified merge list algorithm.
After examining the possibility of combining linear graphs, the execution cost of the combined graphs is calculated by (7).

External combination phase
In this phase (Lines 20-27), the proposed algorithm aims to transfer the scheduled tasks of the less powerful instances to the idle times of the more powerful ones. This transfer can reduce the number of used instances and consequently the overall execution cost. For this purpose, in each stage, the idle-time periods of the most powerful utilized instance are determined. For each idle time, the present study looks for an appropriate linear graph that can be scheduled on the idle time before its deadline. After the desired linear graph is determined, its tasks are transferred and the original instance is removed from the required instances.

Evaluation
This section first describes the current work's experimental setting and evaluation criteria, and then compares the proposed method with state-of-the-art approaches that are similar in terms of objectives. The methods used for comparison are IC-PCP [18], BDHEFT [15], PDC [12], and the present authors' previous study, DQWS [5]. All of these algorithms are reviewed in the Related Work section.

Experimental setting and evaluation criteria
To evaluate the performance of the proposed algorithm, different simulation scenarios are run. Simulation is a well-accepted approach for evaluating workflow scheduling algorithms. With simulation, it is possible to test the performance of the algorithms under a controlled setting [27] . In the present study's simulation, which is performed by MATLAB, the service provider offers five different VM types with different costs and processing powers. The characteristics of the VMs are based on the US-east Amazon region and are presented in Table 1. Unlimited instances of each VM type are assumed. VMs are in a single data center and the average bandwidth between instances is fixed at 20 MBps [28]. The current study conducts its experiments using four different scientific workflows: CyberShake, Epigenomics, LIGO, and Motif. These workflows are diverse in terms of structure, computational characteristics, and communication data. In addition, the workflows are employed in different scientific areas, such as earthquake science, biology, gravitational physics, and genetics. Full descriptions of these workflows are provided in [29,30]. Fig. 10 depicts the structure of each workflow in a relatively small size. These workflows are provided by the Pegasus workflow generator 4 as a DAX (Directed Acyclic Graph in XML) format and, for each workflow, the details, including the DAG, the sizes of data transfer, and the tasks execution time, are published. These workflows have been widely used for evaluating the performance of scheduling algorithms, and thus they are included in the current study's experiments. In order to determine the execution time of each workflow task by each VM type, we assume the published execution time for each task is calculated for the VM with ECU = 1. With this assumption, the execution time of each task on the other VM types is calculated by dividing its published execution time by the ECU value of the VM which is shown in Table 1. As shown in Table 2, the present work chooses four different sizes for each workflow type and generates 50 random samples from each type and size of workflow.
To assess the performance of the compared algorithms, it is necessary to determine the acceptable values for each workflow's deadline. The concept of the fastest schedule is utilized to determine deadlines proportionate to each workflow structure. The fastest schedule is the scheduling of each workflow task on a distinct instance of the fastest VM type without considering the communication time between tasks [18]. The makespan obtained by the fastest schedule for each workflow, makespan LB , is the lower bound of the overall workflow execution time. Using makespan LB , the present study calculates a variation for a deadline, from tight to relaxed, as follows: α starts from 2 and is increased by 2 up to a value of 10. In this way, the impact of different deadlines (from tight to relaxed) is evaluated on the performance of each algorithm.
To evaluate the performance of the algorithm, the present work analyzes the following metrics in its experiments: normalized cost, success rate, and instance reduction. Each metric is explained as follows: Normalized cost: The overall workflow execution cost obtained by each of the algorithms is a suitable criterion for evaluating the compared algo-  rithms. Because of the difference in the structure and characteristics of the benchmark workflows, the current paper employs a normalized cost metric for comparison. The normalized cost of executing each workflow with any of the benchmark algorithms is obtained by dividing its overall execution cost, cost(wf), by its scheduling cost based on the cheapest scheduling strategy(cost lowest (wf)) [18], as follows: Success rate: To evaluate the capability of each algorithm to meet the deadline constraints, the present study defines the metric of success rate as the ratio between the number of successful schedules and the total number of schedules. Instance reduction: To compare the performance of the proposed algorithm against that of the current authors' previous research (DQWS), an instance reduction parameter is introduced. This parameter shows the reduction factor in the percentage of the number of instances needed for scheduling each workflow type by EDQWS in comparison to DQWS. Since the present study's focus in the linear graph scheduling phase is the combination of linear graphs, reducing the number of instances, in addition to minimizing the cost, can serve as a metric for comparing the two methods. A reduction in the number of instances required to schedule workflows can lower the probability of instance failure. This is also helpful for cloud providers which have a limitation in the number of requested instances.

Experimental result
The performance of the proposed algorithm is evaluated in three separate subsections. In the first subsection, the execution cost of each workflow type is examined by all compared algorithms. In the second subsection, the success rate of all compared algorithms is evaluated. In the third subsection, the two methods, DQWS and EDQWS, are compared in more detail. For this purpose, a comparison is made between the average cost reduction in EDQWS and in DQWS. Additionally, the number of instances required for each algorithm to schedule each workflow type is studied.

Normalized cost analysis
In this subsection, the execution cost of each workflow is examined by all the compared algorithms. In these experiments, 50 random samples are generated from each type and size of the workflows, and each random sample is scheduled by all algorithms for five different deadline factors. Given that the results of running each algorithm on different workflow sizes are relatively similar, only the results of implementing the largest size of each workflow type are reported. Fig. 11 presents the normalized cost of all of the compared algorithms when executing the four workflow types (CyberShake, Epigenomics, LIGO, and Motif ). The normalized cost values are the mean value of the 50 random samples generated for each workflow type. Besides the mean values, the standard deviation is also provided in Fig. 11. The reporting of 'Fail' in each graph indicates that the specified algorithm was unable to schedule the workflow within the given deadline factor. Figure 11.a presents the results of the CyberShake workflow execution by the five compared algorithms. As observed in Fig. 11.a, none of the compared algorithms are able to schedule this type of workflow within the deadline factor of 2. The lowest execution cost for this type of workflow is obtained by the EDQWS algorithm.   11:13 In this type of workflow, DQWS and EDQWS succeed in creating multiple linear graphs with a small number of tasks and short execution times. As a result, utilizing the idle time of instances for scheduling linear graphs can reduce the execution cost of this workflow type. Although the two algorithms utilize the same linear graphs, a comparison of their normalized cost reduction indicates that EDQWS' consideration of different linear graph combinations in instance type selection leads to more appropriate solutions than DQWS' usage of the greedy method.
As seen in Fig. 11.a, the PDC algorithm failed to schedule any random samples of the CyberShake workflow within the specified deadlines. The failure of PDC in scheduling this type of workflow is due to the way in which it sets the initial deadline. PDC determines the initial deadline for each workflow by considering the communication time between tasks. If the user-defined deadline is sooner than the initial deadline, the workflow will not be scheduled. As mentioned earlier, for determining the deadline, the proposed algorithm employs the concept of the fastest schedule which eliminates the communication time between tasks during the deadline calculation. Because the execution time of CyberShake tasks are short and the communication time between them is long, the deadlines calculated by the fastest schedule are shorter than the PDC initial deadline. Therefore, the results of the PDC algorithm are not included in Fig. 11.a.
In Epigenomics, the possibility that DQWS and EDQWS will utilize the resource idle time is less than that of the other workflows because of the high execution time of tasks. As seen in Fig. 11.b, the normalized cost obtained by DQWS, EDQWS, and PDC is almost the same, while BDHEFT reports a higher execution cost than the other algorithms. For the deadline factors that IC-PCP is successful in meeting, its normalized execution cost nears that of DQWS, EDQWS, and PDC. Figure 11.c shows the results of LIGO scheduling by all of the compared algorithms. In this type of workflow, the normalized cost obtained by EDQWS is the lowest among the other algorithms. Due to the existence of several parallel tasks in the structure of this workflow, the creation of multiple linear graphs has increased the possibility of utilizing instance idle times. Furthermore, the usage of the scoring method and examination of linear graph combinations may be the reason why the normalized cost reduction in EDQWS is greater than that of the greedy method. Figure 11.d provides the results obtained from the present study's simulations for the Motif workflow. As seen, IC-PCP, DQWS, and EDQWS outperform the BDHEFT and PDC algorithms. Based on the results obtained from IC-PCP, DQWS, and EDQWS, it is observed that, in tight deadlines, the DQWS and EDQWS algorithms perform better while, in relaxed deadlines, ICPCP's performance is superior. For motif scheduling, EDQWS performs much like DQWS and the normalized cost of these two algorithms is the same. A possible reason for this similarity may be that the motif structure creates single-task linear graphs during the division phase. Subsection 5-2-3 shall compare these two methods in more detail.

Success rate analysis
By considering different deadline factors, this section examines the success rate of the compared algorithms for each workflow type. All random samples created from any type of workflow are included in the success rate calculation. Fig. 12 provides the success rates of the five algorithms (EDQWS, DQWS, PDC, BDHEFT, and IC-PCP). As the results show, the success rates of EDQWS and DQWS are exactly the same for all deadline factors and the different workflows. Given that these two algorithms employ the same method to create linear graphs, it can be concluded that the method of scheduling linear graphs does not impact the success or failure of the schedule but only affects the workflow execution cost. As shown in Fig. 12, for all workflow types, the proposed method has a higher success rate than that of the PDC, IC-PCP, and BDHEFT algorithms. PDC's low success rate may be explained by the fact that this algorithm does not accept most of the deadline factors and so is unable to schedule the workflow.

The comparison of DQWS and EDQWS
This section compares the current authors' newly introduced EDQWS algorithm and their previous algorithm, DQWS. As mentioned earlier, the steps for creating linear graphs in these two algorithms are similar, but their methods differ in the linear graph scheduling phase. For scheduling linear graphs, DQWS utilizes a greedy method while the present paper's EDQWS algorithm employs a scoring method. To evaluate the performance of these two algorithms, two comparisons are made. First, the number of instances required by EDQWS and DQWS to schedule each workflow type is compared. Second, the difference between these two algorithms' normalized cost and the number of instances are examined. Figure 13 shows the number of instances required by DQWS and EDQWS to schedule the CyberShake, Epigenomics, LIGO, and Motif workflows. It should be noted that the number of instance values is the mean value of 50 random samples generated for each workflow type. As seen in Fig. 13, in comparison to DQWS, the number of instances required for scheduling CyberShake and Epigenomics is greatly reduced by the EDQWS algorithm. However, this reduction is not observed in the LIGO and Motif workflows. Table 3 summarizes the results obtained from the above experiments. This table presents three different parameters for each algorithm: the average percentage of cost reduction in EDQWS compared to DQWS, the average percentage of the reduction in the number of instances in EDQWS versus DQWS, and the percentage of experiments in which EDQWS succeeds in reducing the number of instances in comparison to DQWS.
As seen in Table 3, the average normalized cost for the three types of workflow (CyberShake, Epigenomics, and LIGO) is reduced by the proposed algorithm when compared to the DQWS algorithm. Also, the current work's algorithm succeeds in reducing the average number of instances required to schedule CyberShake and Epigenomics. In LIGO, although the average number of instances in the proposed algorithm increases slightly, the number of instances in more than half of the experiments decreases in comparison to DQWS. In Motif, despite the same normalized cost in both algorithms, the number of instances decreases in a small percentage of the proposed method's experiments.

Conclusion
The present paper proposes a static scheduling algorithm called EDQWS, which is an extension of a previous study by the current authors [5]. EDQWS is a two-phase workflow scheduler based on divide and conquer and aims to minimize the overall workflow execution cost by considering a user-defined deadline. In the first phase, similar to the present authors' previous research, the division of workflow into sub-workflows is achieved by determining and scheduling the critical path and removing it from the workflow. By eliminating the critical path, the workflow is divided into several sub-workflows, each of which undergoes this same division. The stop condition is to attain a sub-workflow with a chain structure called a linear graph. For scheduling linear graphs in the second phase, the current work proposes a new merging algorithm to combine the resulting linear graphs, reduce the number of used instances, and minimize the overall execution cost. Also introduced is a scoring function to select the most efficient instances for scheduling the linear graphs. The experiments are conducted with four well-known workflows that determine whether EDQWS has an overall better performance than the state-of-the-art algorithms, IC-PCP, PDC, BDHEFT, and DQWS. In terms of the normalized cost parameter, EDQWS shows acceptable results when compared to the other methods. As for the success rate parameter, EDQWS and DQWS are completely the same in all deadline factors and for different workflows. Given that these two algorithms use the same method in creating linear graphs, the method of scheduling linear graphs has no effect on the success or failure of the schedule and only affects the workflow execution cost. However, the success rate of both algorithms is higher than that of the other methods, especially under tight deadlines. In comparing the performance of EDQWS with that of the present authors' previous research (DQWS), the results show that, in more than 50% of the examined workflow samples, the number of resource instances decreases in EDQWS in comparison to DQWS. Reducing the number of resource instances in addition to decreasing the probability of instance failure also leads to a reduction in the overall execution cost of the examined workflows. According to the above description, it can be concluded that the definition of the scoring function and relying on it to combine linear graphs and select virtual machine types has led to a more appropriate selection of VM types than the other baseline methods. The use of our new merge list algorithm as well as the policy for transferring the task from the less powerful instances to the idle times of the more powerful ones has also had a significant impact on improving the results. The new merge list algorithm combines the tasks of linear graphs with respect to the laxity of tasks and improves the scheduling of tasks by changing the mapping of pre-scheduled tasks on the instances. By transferring the pre-scheduled tasks from the less powerful instances to the idle times of the more powerful ones, the algorithm can use these idle times for which there is no need to pay extra, and remove less powerful instances from the list of required instances. For future work, the current authors intend to extend their merging algorithm for more than two linear graphs. Furthermore, a nongreedy algorithm shall be proposed for selecting instance idle-time to transfer scheduled tasks in the external combination phase.