Multi-objective workflow optimization strategy (MOWOS) for cloud computing

Workflow scheduling involves mapping large tasks onto cloud resources to improve scheduling efficiency. This has attracted the interest of many researchers, who devoted their time and resources to improve the performance of scheduling in cloud computing. However, scientific workflows are big data applications, hence the executions are expensive and time consuming. In order to address this issue, we have extended our previous work ”Cost Optimised Heuristic Algorithm (COHA)” and presented a novel workflow scheduling algorithm named Multi-Objective Workflow Optimization Strategy (MOWOS) to jointly reduce execution cost and execution makespan. MOWOS employs tasks splitting mechanism to split large tasks into sub-tasks to reduce their scheduling length. Moreover, two new algorithms called MaxVM selection and MinVM selection are presented in MOWOS for task allocations. The design purpose of MOWOS is to enable all tasks to successfully meet their deadlines at a reduced time and budget. We have carefully tested the performance of MOWOS with a list of workflow inputs. The simulation results have demonstrated that MOWOS can effectively perform VM allocation and deployment, and well handle incoming streaming tasks with a random arriving rate. The performance of the proposed algorithm increases significantly in large and extra-large workflow tasks than in small and medium workflow tasks when compared to the state-of-art work. It can greatly reduce cost by 8%, minimize makespan by 10% and improve resource utilization by 53%, while also allowing all tasks to meet their deadlines.


Introduction
Cloud computing, a multipurpose and high-performance internet-based computing, can model and transform a large range of application requirements into a set of workflow tasks. It allows users to represent their computational needs conveniently for data retrieval, reformatting, and analysis [1]. Over the past decades, researchers from different scientific domains such as astronomy, physics, earth science, and bioinformatics have used cloud platforms to model scientific applications for many real-world problems. These applications are modeled as workflows [2] which allow complex and large scientific data to be analyzed and simulated in a cloud computing environment. This is because cloud computing has lower the upfront dependencies between the workflows. In other words, it indicates that a successor workflow task cannot start until the predecessor workflow task is completed [2,[8][9][10].
Scheduling workflows in the cloud computing environment is gaining ground and remains an attractive research area for many scientists. This is attributed to the rapid growth of the cloud industry and the opportunities it present for cloud users. The cloud can deploy resources virtually or remotely, which allows scientific discoveries to be carried out on a large scale [11]. However, generating an effective schedule with the current heuristics algorithms remains a challenge. Scientific workflows are big data applications and often require a large budget and more time to execute. This is due to their nature and data size. This problem becomes more obvious when the workflow tasks to be scheduled have deadlines. Much work has been done by other researchers to find an optimal solution to this problem through heuristic algorithms. Nevertheless, the problem still exists. This is because most of these heuristics rely heavily on job priority without considering the scheduling length of the job. Hence, it is very difficult to achieve an optimal solution with the current heuristic algorithms.
Moreover, the workflow scheduling, in particular, is complicated and has been defined by many researchers as NP-complete problem, thus making the orchestration of workflow tasks execution challenging [12,13]. This is due to the complexity in the structure of scientific workflows, as one workflow application can produce many discrete tasks for scheduling [14]. As a result, generating a schedule to optimize the two most important, yet conflicting scheduling objectives i.e., execution cost and execution makespan becomes a complicated problem. For example, optimizing execution cost increases the execution makespan. This is due to the interlink that exists between these objectives. Makespan and cost optimization problem persist because VM selection which is a key in managing resource utilization to improve system throughput is usually ignored by researchers. Execution cost and makespan conflicting challenge is an acknowledged problem that needs to be addressed appropriately [15,16].
In this paper, we addressed the workflow scheduling problem by extending our work originally presented in IEEE 6 th Internal Conference on Big Data Security on Cloud (BigDataSecurity) [17]. In our previous work, a task splitting algorithm known as Cost Optimised Heuristic Algorithm (COHA) [17] was used to split large tasks with longer executing lengths to allow them to meet their deadlines at a lesser cost. However, in the previous work, we only applied execution cost as the performance evaluation metric which is not adequate to measure the efficiency of the algorithm. Moreover, there exist some research gaps such as VM selecting and task mapping criteria that are worthy of further investigation. We extended our previous work to consider tasks execution makespan and resource utilization as metrics for performance evaluation and optimization goals.
The main contributions of the paper are summarized as follows: 1 We introduce a triple-stage layer workflow execution model and cloud resources model to support achieve the aim of the proposed algorithm. 2 We presented a multi-objective workflow minimization strategy (MOWOS) to jointly minimize the execution cost and execution makespan of workflow tasks. 3 A novel measure called MaxVM selection is introduced. This method is responsible for selecting and mapping workflow tasks with maximum (longer) execution time on VMs with Maximum (higher) execution capacities. This is done to help reduce the waiting times of workflow tasks with longer execution times. 4 An efficient scheme known as MinVM selection method is introduced to select and map workflows with minimum (shorter) execution time. This is done to avoid mapping smaller workflows on VMs with higher execution capacities that comes with higher cost and may lead to an increase in execution cost. 5 Re-evaluating the variants of the extended algorithm through four real scientific workflows.
The remaining paper is structured as follows. Related work is introduced in "Related work" section. "System models" section described our system models, a detailed description of the proposed Multi-Objective Workflow Optimization Strategy (MOWOS) is presented in "Proposed algorithm: multi-objective workflow optimization strategy (MOWOS)" section. The performance Evaluation, Experimental Setting, Workflow structure, Results and Analysis are presented in "Performance evaluation" section, and finally, the paper is concluded in "Conclusion and future work" section.

Related work
Workflow scheduling is one most difficult task which needs to be looked at in the cloud computing environment. This is due to the complexity in its structure, as one workflow application can produce many discrete tasks for scheduling [14]. This have been defined by many researchers as NP-complete problem [13,[18][19][20][21][22][23][24]. Considerable research efforts have been made by previous researchers to solve the workflow scheduling problem, nevertheless, the problem persists. For example, in [25], the NP-complete problem was proven. The researchers transform non-convex constraint to many linear constraints using linearization and reformulation based on heuristic techniques. On the other hand, the researchers in [26] presented GA-ETI to consider the relationship between jobs and their required data to enhance the efficiency in running workflow tasks on cloud resources. GA-ETI is capable of optimizing both makespan and cost. However, it is restricted by prior knowledge from identifying overloaded and under-loaded VM for workload redistribution. Different types of research, based on the Min-Min scheduling algorithm for task scheduling has been conducted to reduce makespan, execution cost, and to improve the utilization rate of cloud resources. For example, Liu et al. [27] took into consideration three task scheduling constraints such as quality of service, the dynamic priority model and the cost of service; and proposed an improved Min-min algorithm for task scheduling in a cloud computing environment, for enhanced makespan and resource utilization rate. The results show that the improved approach is efficient and can increase the utilization rate of cloud resources. Also, it can schedule large tasks timely to meet the requirements of cloud users. However, it is less effective when there are more large tasks than short tasks. Also, the researchers in literature [28][29][30][31] have acknowledged the impressive performance of min-min in reducing makespan and have compared their methods with min-min and other existing algorithms to ascertain the performance of their methods concerning execution makespan and execution cost.
Many researchers have also used Max-Min in different capacities to enhance task scheduling in the cloud computing environment. For example, Li et al. [32] presented an improved Max-Min based technique call MIN-Max-Min algorithms. It reduces the average makespan of jobs. However, the proposed method is not efficient to exploit parallel tasks from multiple sources, and hence not able to reduce idle time slots. Also, the method is not scalable and does not consider the dynamic nature of cloud resources. Ghumman and Kaur [33] presented a hybrid method called improved Max-Min Ant Colony Algorithm. The method combines the concept of maxmin and ant colony algorithm to get workflow scheduled. Through simulations, the proposed method is seen to be efficient in providing better results in makespan and execution cost. However, the approach does not consider the length of workflows and VM selection methods, and therefore could not fully utilize the available resources effectively.
Also, in [8], a Fuzzy Dominance sort based Heterogeneous Earliest-Finish-Time (FDHEFT) algorithm was presented. The approach has two phases, thus, task prioritizing phase and instance selection phase. In the task prioritizing phase, the algorithm calculates the priorities of every task and queue them in non-increasing order with the upper values ranked first. The instance selection phase sorts and selects all tasks based on their fuzzy dominance (priority) values to minimize cost and makespan. However, the limitation of FDHEFT is that, tasks are selected based on their fuzzy dominance values without considering the size of task and its corresponding VM speed. Matching workflow task to the appropriate resource for execution will help avoid task missing their deadline. A general framework heuristic algorithm for multi-objective static scheduling of scientific workflows in heterogeneous computing environments call Multi-objective list scheduling algorithm (MOLS) was proposed in [34]. The algorithm tries to find a suitable Pareto solution by deploying two strategies, that is, maximizing the distance to the user constraint vector for dominant solutions and minimizing it otherwise. Though, proposed algorithm is capable of producing better results, in cost and makespan, the approach mainly focuses in reducing makespan and cost without considering the workload and it impact on resources.
Besides, Cost-Effective Deadline Constrained Dynamic scheduling algorithm for scientific workflow scheduling in cloud known as Just-In-Time (JIT-C) was proposed in [35]. JIT-C rely on the many advantages presented by cloud, while taking care of the performance differences in VMs and instance acquisition delay for effective scheduling to meet deadline of all workflow task at a reduced makespan and cost. The algorithm addresses three major issues including VM performance variation, resource acquisition delays and heterogeneous nature of cloud resources. Also, the issue of runtime overhead of the algorithm was not left out. Other methods such as: (i) Pre-processed approach for combining pipeline tasks in a single task to save data transfer time and reduces the runtime overhead, (ii) Monitor control loop technique to monitors the progress of all running workflow tasks and makes scheduling decision in terms of performance variation and (ii) Plan and Schedule method to coordinate with 'cheapesttaskVMmap' method for low cost schedule were deployed. Though the proposed method is proven to be effective and efficient in meeting deadlines, producing low makepsan and cost, however, it is very expensive in generating schedules when the deadline factor is low. At a reduced deadline factor, the slack time is likely to be zero or low and when this happen it can increase the cost of executing a workflow task.
Other different researches based on reducing execution cost and energy consumption under deadline constraints are considered. In this regard, Li et al 2015 suggested a cost-effective energy-aware scheduling algorithm for scientific workflows in heterogeneous cloud computing environments. The proposed method is intended to minimize the execution cost of workflow and reduce the energy consumption while meeting the deadlines of all workflow tasks. To achieve this, four different methods were deployed which include: i) the VM selection algorithms that use the concept of cost-utility to map workflow tasks onto the best VMs. ii) Task merging methods to minimize execution cost and energy consumption, iii) VM reuse method to reuse the unused VM instance and iv) Task slacking algorithm based on DVFS techniques to save energy of leased VMs. Cost-effective energy-aware algorithms can minimize the execution cost of workflows and considerably save energy. However, the proposed method consumes more time to identify VMs types. This can affect the execution cost of workflow tasks since time is a major determiner of cost in the cloud computing environment. Moreover, Haidri et al. [22] identified VM acquisition delay as one main challenge for workflow task scheduling in a cloud computing environment. They proposed a Cost-Effective Deadline Aware (CEDA) scheduling strategy to optimize total workflow task execution time and economic cost, while meeting deadlines. The method selects a workflow task with the highest upward rank value at each step and dispatches it to the cheapest instance for a reduced makespan and cost. Also, slack time was used to schedule other tasks to further reduce the price overhead. However, CEDA is not effective for for large workflows. In [36], Customer Facilitated Cost-based Scheduling (CFCSC) algorithm was presented. The method is presented to schedule a task to reduce cost and execution makespan on the available cloud resources. CFCSC is only efficient with small workflow task but performs abysmally in makespan when large numbers of tasks are scheduled. This can be attributed to the fact that CFCSC assigns workflow tasks in a critical path to cloud resources and allowing the non-critical path workflows to stay long in the queue.
From the views of all the researchers in the literature, it is observed that most of the methodologies focus on resource efficiency to optimize workflow scheduling. This can cause load imbalance and inefficient resource utilization [37]. Different from the aforementioned work, our study presents a task splitting management system that considers both resource efficiency and the workload to be scheduled. Considering the complexity of workflows, we provided Maxvm and Minvm allocation strategies to reduce the system execution cost, time and to fully utilize the cloud resources, while ensuring tasks meet their deadlines.

Workflow application model
A scientific workflow is a representation of a set of workflow tasks which is modeled as a directed acyclic graph (DAG) [38], and defined by a tuple G = (W, E). Where W = (Wt 1 , Wt 2 , Wt 3 , Wt 4 . . . . . . . , Wt n ); Wt is a set of 'n' workflow tasks in a scientific workflow application. Where E donate the set of edges which represent the flow of data dependencies constraint between workflow task Wt i and workflow task Wt j which is denoted by E i,j = (Wt i ,Wt j ). Every edge E i,j is a representation of a precedence constraint workflow which indicates that workflow task wt j cannot start until wt i completes. In this scenario, workflow task wt i is a predecessor workflow task of wt j while workflow task wt j is called the immediate successors of workflow task wt i . On this note, all predecessor workflow task of wt i is represented as pre (wti) while all the successor workflow task is represented as succ (wti) . Therefore predecessor workflow task and successor's workflow tasks are donated by Eqs. 1 and 2: Every workflow DAG has an entry task and exit task. Figure 1 is a representation of a sample workflow DAG of 12 tasks with entry and exit tasks. An entry workflow task is a workflow task without a predecessor workflow task which is donated by wt entry as in Eq. 3.

Pre(Wt entry
An exit workflow task is a workflow task without a successor, which is also donated by wt exit as in Eq. 4.

Resource model
Resource allocation involves the management of cloud resources in the cloud datacenter to increase system efficiency. The resource model consists of several cloud users and different cloud service providers as in Fig. 2. Let CSP = (csp 1 , csp 2 , csp 3 ........... csp n ) be the list of Cloud Service Providers offering cloud resources (VMs), and let r = ∞ s=1 {r s } represent the available VM in the cloud data center which is unlimited for the cloud user. Let K = n k=1 {R k } denote the types of VM where n is the number of VM in type k [8] which are represented as R = (vm 1 , vm 2 , vm 3 ..........vm k ) be the list of cloud resources available to a list of cloud users represented by cu = (cu 1 , cu 2 , cu 3 ........... cu n ) for workflow task execution. These VMs have different configurations and different prices and are modelled by a tuple vm(pc; c) [39], where pc represents the processing capacity of the VM and c denotes the monetary cost of the VM which is payable in hourly bases. Each resource in the resource list has a unique configuration and the billing is based on the processing capacity of the VM. In other-words, a VM with higher processing capacity cost more than a VM with lesser processing capacity [40]. Let pc = (pc min , pc max ,..............Pc xx ) be the processing Where ECT is the expected completing time of wt 1 on VM k , Dl is the deadline of wt 1 . The two categories of VMs used in this research are defined below: Category 2

Workflow execution model
Mostly, scientific workflows are used to manage data flow. It is modeled as a directed acyclic graph (DAG) which represents a sequence of tasks that processes a set of data [2,8]. The execution process of workflow has two major phases which include, the resource provisioning phase and task generating and mapping phase [2]. The resource provisioning phase discovers all the available cloud computing instances (both hardware and software) and deploys them to guarantee a successful execution of every incoming task. The tasks mapping phase, on the other hand, is a process where all the unmapped tasks in the metadata are mapped onto the various Virtual Machines (VMs) for execution. The aim of executing a workflow is to ensure Efficient, Effective and Just-in-time Scheduling Plan (EEJSP) that will increase throughput, minimize the makespan, and total execution cost [42]. Our proposed workflow execution model presented in Fig. 3 is a triple-stage layer model that relies on the opportunities and challenges of cloud computing while taking into account workflows deadlines and QoS constraints. The first layer is the application layer, followed by the execution Layer, and the cloud infrastructure layer. The layers of the workflow execution model used in this work are highlighted below: 1 Application layer: The application layer provides the types of workflow applications which is used as a data set for this research. For simulation purposes, four real-world workflows provided by the Pegasus workflow management system [43][44][45] are used. These workflows have been practically used by different researchers to model and evaluate the performance of workflow scheduling algorithms in the cloud computing domain. These applications are modeled as a DAG with edges e(wt i , wt j ) between the workflow tasks. The edges of the workflows represent precedence constrains. The edge e(wt i , wt j ) indicates that wt i is a direct predecessor of wt j and should finish execution before workflow wt j which is an immediate successor of wt i [5]. These workflows are applied in different scientific domains such as bioinformatics, astronomy, astrophysics, etc. [42,45]. 2 Execution Layer: In scheduling, a range of workflow tasks with different sizes, with or without schedul-ing constraints arrived at the execution layer for scheduling decisions to be made. The execution layer comprising the proposed scheduler (MOWOS); the constraints such as deadlines, budget, and quality of service; and the optimization objectives, is responsible for ensuring that every arrived workflow task is given the opportunity to be mapped onto a VM by the scheduler. The job of the scheduler is to ensure that these workflow tasks are successfully scheduled at a given budget and time. For example, a workflow task wt i arrived at the execution layer with a deadline of 2 min. In this scenario, workflow task wt i needs to be mapped onto a VM before its deadline as specified by the user. In this case, the scheduler will have to select a VM with the capacity to schedule wt i such that, it will not violate the deadline constraint. The primary aim of the execution layer is to manage all incoming workflow tasks, to find an optimal solution to two important, yet conflicting scheduling objectives such as execution makespan and execution cost. Makespan is defined as the maximum finishing time among all received workflows per time. It shows the quality of the workflows assignment on VMs from the execution time point of view. The cost of execution workflow task (wt i on vm k ) is defined as where ET(wt i , vm k ) is the Execution Time for executing workflow task wt i on vm k and cost (vm k ) is the cost for executing workflow task wt i on vm k . At each step of the execution phase, expected completing time (ECT) of each task is generated and compared with the deadline of the task as specified by the user to determine whether the task can meet its deadline. Expected Completion Time of workflow task wt i on vm k is denoted as ECT(wt i , vm k ), which can be computed by using Eq. 7 [46].
Where ET is execution time of workflow task wt i on vm k and loadvm k is the workload of vm k at a given time.
Execution Time of workflow task wt i on vm k is denoted as ET(wt i , vm k ), and it can be calculated using Eq. 8.

ET =
TL MIPS vmk (8) Where TL is the task length, MIPSvm k is the Million instruction per seconds of vm k . 3 Infrastructure Layer: Cloud provides software which is available via a third-party over the internet referred as software as a service (SaaS); services such as storage, networking, and virtualization known as infrastructure as a service (IaaS) and hardware and software tools available over the internet which is commonly referred to as platform as a Service (PaaS) [47]. The infrastructure layer which refers to as IaaS cloud provides services such as storage, networking, and virtualization services needed to support cloud computing to function. Our proposed method is making use of the services provided by the IaaS cloud such as storage, to store workflow applications, memory to process the applications, Physical Machines (PMs) to configure VMs to execute cloud users' requests.

Proposed algorithm: multi-objective workflow optimization strategy (MOWOS)
We Began by making the following assumptions: 1 All workflow tasks submitted can be split into subtasks. 2 Deadlines of workflows are known on arrival. 3 Every large workflow task, when split can meet its deadline. 4 Provisioned resources are available from the starting of the workflow exaction to its end. 5 The VMs workload have no affect on the tasks splitting process In this section, we present the proposed MOWOS algorithm to optimize multi-objective. Multi-objective workflow optimization involves when two conflicting, but yet important scheduling objectives such as execution cost and execution makespan are optimized concurrently in the cloud computing environment. After extensive literature, it was revealed that most of the current heuristic algorithms are not robust in optimizing conflicting objectives such as execution cost and execution makespan simultaneously. Taking this into consideration, we extended our previous work (COHA) and presented a novel scheduling heuristic called Multi-Objective Workflow Optimization Strategy (MOWOS) for the purpose of . When a task arrived at the queue, MOWOS first calculates the expected completing time (ECT) of each workflow task in the wt queue . Then, a new queue call a deadline queue (Dl queue ) is created and all the arrived tasks in the wt queue are re-queued in the Dl queue based on their user-specified deadlines. In the next step, the ECT of each workflow task is compared with its user-specified deadline. And if for example, an ECT of wt i on VM k is grater than its deadline, then the algorithm will apply the split method in algorithm 2 to split the task into sub-tasks and employ the minimum VM selection method (MinVM) to map the split tasks. MinVM selection method is introduced to map small tasks on a lower cost VMs. This is done to avoid scheduling small tasks on VMs with higher MIPS (highcost VMs) to reduce the cost of execution. On the other hand, if the ECT of wt i on VM k is equal to its deadline, the MaxVM selection method (VM with a higher MIPS) will be deployed to execute the task faster. The MaxVM selection method is intended to map all workflow tasks that have equal ECT to their deadlines on a higher VMs to reduce makespan. Detail description of the sub-algorithms are explained herein:

Tasks splitting algorithm:
Reducing execution cost and execution makespan in the cloud computing environment, is an important issue for cloud service providers. If cloud service providers cannot reduce the cost and time of workflow scheduling, it may lead to customer dissatisfaction, which will consequently affect the profit margin of the service providers. The easy way to reduce the execution cost and time of workflow tasks, is to ensure workflows do not miss their deadlines. Scheduling large workflows tasks onto cloud resources delays the scheduling processes, thus making some of the workflows, to miss their deadlines. On the contrary, splitting large workflows tasks into sub-tasks, reduces the scheduling length of workflow tasks, thereby allowing every workflow task to meet their deadlines. Researchers like [48,49] have proved that scheduling small size of workflow tasks on cloud resources provides better execution makespan and cost than mapping large workflow tasks on cloud resources. For this reason, we presented tasks splitting algorithm to split tasks that are likely not to meet their user-specified deadlines into sub-tasks, to reduce their scheduling lengths. This helps in: (i) given effective estimates in execution cost and time, (ii) identifying and fixing bottlenecks easily, and (iii) saving data transfer time. The pseudo-code of the tasks splitting algorithm is in algorithm 2. The algorithm starts by identifying all the arrived workflow tasks in the workflow tasks queue with a set of workflow tasks Wt=(wt 1 , wt 2 ,wt 3 wt 4 ........... wt n ). Afterward, the deadline for each task is compared with its ECT, and if the ECT of wt i on vm k is greater than its deadline, the task will be split into sub-tasks and sent to the ready queue for mapping decision. The algorithm terminates when all the large workflow tasks are successfully split.

MaxVM selection algorithm
The focus of most scheduling policies is to reduce the waiting time of tasks to allow them to meet their user-specified deadlines and thereby minimizes the makespan. Given this rationale, we introduce a method called maximum VM selection method (MaxVM selection) to map large tasks effectively on VMs to further reduced makespan and increase scheduling efficiency. Given a set of VMs as (VM l , VM m , VM n .........VM k ) with their corresponding sizes as (VM Max , VM Min .......... VM xx ), the VMs are sorted in descending order and queued based on the execution speeds or sizes. Then the MaxVM selection method identifies VM in the VM queue with a higher execution capacity to execute the tasks that their ECTs are equal to their deadlines. The objective of this method is to help reduce the waiting time of workflow tasks with maximum execution time and thereby reduce the total execution makespan.

MinVM selection algorithm:
After splitting the large workflow tasks in algorithm 2, the tasks have to be mapped onto VMs for execution. When mapping tasks onto VMs for execution, certain decisions have to be made based on the objectives of the algorithm. In this research, the objective of the algorithm is to schedule a task for a reduced execution makespan and cost. For this reason, we introduce the MinVM selection method to map all the tasks that their ECTs are less than their userspecified deadlines. This is done to avoid mapping smaller workflow tasks on VMs with higher execution capacities that comes with higher cost and may lead to an increase in execution cost. The algorithm begins by identifying the MIPS of VMs (line 6), which is used to determine the VMs with lower MIPS (cheaper VMs) in the VM list (line 7) and then migrates all the tasks with short execution length onto the cheaper VMs to maximize profit. The rationale is that, as tasks are split in algorithm 2, and are less than their deadlines, low-cost VMs can easily execute them without delays. Table 1 presents the abbreviations and their definitions used throughout the paper.

Time complexity analysis
The overall time complexity of the proposed MOWOS algorithm is based on execution makespan and execution cost complexity, which depends on the number of workflow tasks and the size of VMs. It starts with algorithm 1 to calculate the ECT of all workflow tasks. Moreover, its loop at line 7 to generate a schedule by comparing the ECT of workflow tasks to their deadlines. In total, it takes O (n + l) where n and l are the number of tasks and tasks length respectively. Algorithm 2 iterate through to determine the size of each task and their deadlines to be able to identify the large tasks for splitting. The time complexity for task splitting depends on the size of the tasks and its deadline which is given as O (s + d) where s is the size of the workflow task and d is the deadline of the workflow task in a workflow queue. After splitting a task, we search for maxVMs

Performance evaluation
In this section, we present a set of extensive simulation experiments using different workflow inputs aim at evaluating the performance and contributions of the proposed MOWOS algorithm. Three different experiments are conducted in this section. In the first experiment, we evaluated the performance of the proposed algorithm in execution cost. The second phase of the experiment evaluates the performance of scheduling efficiency (makespan) in the cloud computing environment and in the third phase, we evaluated the resource utilization rate of the algorithm and compared the results with existing state-of-art workflow scheduling algorithms such as HSLJF [46] and SECURE [50]. This research adopted a simulation-based approach because, it offers cloud users the opportunity to pre-test the cloud services to determine their performance before they are made available to users in the real clouds [51]. This section consists of three sub-sections, namely; Experimental Setup, Workflow Structure, and Results and Analysis.

Experimental setup:
The focus of the experiment is to optimize execution cost, execution makespan and resource utilization, while meeting deadlines in Infrastructure as a Service (IaaS) Cloud. Cloud computing is viewed as a dynamic environment that makes it challenging to run large scale virtualized applications directly on the cloud data center [52]. We implemented the proposed method in a workflowSim simulator [53]. It is a java based simulation environment with the ability to model and simulates cloud scheduling algorithms [10]. Workflowsim was extended from CloudSim [54] to support the modeling of workflow DAGs in the cloud computing environment. We considered four different real-world benchmark workflows; Montage, CyberShake, SIPHT, and Ligo Inspiral as used in [55][56][57][58]. We grouped the workload of each of the workflows into four sizes including; small, medium, large, and extra-large as shown in Table 2. The groupings are based on standard benchmark-setting of real scientific workflow applications which have been practically used by different researchers to model and evaluate the performance of workflow scheduling algorithms similar to the work in [1,13,[59][60][61]. Choosing different task sizes will afford the researcher the opportunity to measure the performance of the proposed method in different workloads. Also, in the experiment, we consider only four VMs with different configurations. The simulation environment is set as follows: Average bandwidth between resources are fixed at 20 MBps according to [62,63], which is the approximate average bandwidth setting offered in Amazon web services [2,64], the processing capacity of vCPU of each VM is measured in Million Instruction Per Second (MIPS) as in [65], the task lengths is set in Million Instruction (MI) according to [55]. The VM configurations, cost, and processing capacities are modeled based on the Amazon EC2 IaaS cloud offering as proposed by Ostermann et al. [41,66]. Detail descriptions of the four VMs deployed are specified in Table 3. Specifically, the simulation experiments are conducted on a PC processor Intel (R) Core i5 1.6 GHz, 8 GB RAM using Windows 10.

Workflow structure
The simulation process was conducted using four different scientific workflows generated by the Pegasus workflow generator [67] such as Montage, CyberShake, SIPHT, and Ligo Inspiral. These workflows are from different scientific domains and come in large data sets that are structured differently [5]. The Fig. 4 symbolically represent the topological structures of scientific workflows. Detail descriptions of these workflows are presented in [67].
• Montage workflow is an astronomical application created by the National Aeronautics and Space Administration/Infrared Processing and Analysis Center. It is used for the construction of large mosaics of the sky. Montage application can be re-projected into input images for the correct orientation while maintaining the background emission level constants in all images [68]. Montage tasks are data-intensive and there do not require larger processing capacity to process [2]. • CyberShake is used in earthquake science to epitomize earthquake hazards by generating synthetic seismograms [23]. This is done for easy identification of  [67] earth radiance and the production of accurate and reliable environmental estimates. CyberShake is a dataintensive workflow application, that requires a higher processing system with large memory to execute. • LIGO Inspiral is the largest gravitational wave observatory in the universe. It is used in gravitational physics to exploit the physical properties of light and space to produce gravitational activities of the earth. Ligo Inspiral is a CPU intensive task and will need a large memory to process. • SIPHT workflow application is from the bioinformatics project at Harvard which is used to automate the identification of RNAs (sRNAs) encoding genes for samples of bacterial in the National Center for Biotechnology Information database [60].

Results and analysis:
This section highlights the results obtained from a proposed MOWOS algorithm with other two existing state-of-the-art scheduling algorithms -the HSLJF and SECURE. In order to examine the performance of the proposed algorithm over the other two algorithms, the following three performance matrices were used; 1 The execution cost, 2 The execution makespan, 3 The resource utilization.

The Execution
Cost: This is the budgeted total cost to get the workflow schedule on the cloud resources. This cost includes processing cost, the cost of transferring input and output files, etc. The execution cost results using the Montage workflow for different workloads such as 25, 50, 100, and 1000 are compared in Fig. 5. The three algorithms are compared to determine the best cost optimizer. In using the MOWOS algorithm, a lesser execution cost is achieved for every task size of montage. In other words, it is cheaper to generate a schedule with the proposed MOWOS algorithm than HSLJF and SECURE algorithms. Among the four workloads of montage, the SECURE algorithm produced the highest execution cost, because, it is not able to distribute workflows evenly on all the deployed resources.
The Fig. 6, illustrates the performance of various algorithms in terms of execution cost, using the Cybershake workflow on different workloads. The result shows that the execution cost increases with the number of workloads, for each of the evaluated algorithms. From Fig. 6, the proposed MOWOS algorithm has more advantages in reducing execution cost than HSLJF and SECURE algorithms. This is because the proposed method assigns workflow tasks to resources by considering both the workload and the resources capacity to handle the workload.
The SIHPT and Inspiral workflows (as in Figs. 7 and 8) were the next set of analysis to compare the execution cost among the MOWOS, HSLJF and SECURE algorithms. In comparison with the other algorithms, when MOWOS is used to generate a schedule, the rate of decrease in the execution cost is higher as the number of workload increases, as compared to HSLJF and SECURE algorithms. Quintessentially, an increase in the number of workloads increases the number of split tasks. When more tasks are split into sub-tasks, it increases the scheduling efficiency, thereby, reducing the execution cost of workflows. Similarly, the SECURE algorithm is able to improve its performance significantly, by narrowing the execution cost gap in SIPHT workflow, and outperforms the HSLJF algorithm in the Inspiral workflow. The four workflows results for the three workloads indicate that, it cost less to generate a schedule using the proposed MOWOS algorithm than the benchmarked algorithms. When the workload is increased, the overall performance of MOWOS gets much better, as compared to HSLJF and SECURE algorithms. The improvement in the proposed algorithm is that, the proposed algorithm employs VM selection mechanisms, thus making it easy to allocate tasks properly on VMs. Since HSLJF and SECURE do not use VM selection strategies like minVM and maxVM selection, it maps workflow tasks onto VMs without considering the capabilities of VMs. It is also obvious from the illustrations that, the cost of execution increases steadily, when large and extra-large workflows tasks are used. This is because, the cost is determined based on the number of computation and so, an increase in the number of workflows tasks (from 100 to 1000) will lead to a corresponding increase in the cost of execution.
Execution Makespan: The results of makespan for scheduling different workflow tasks onto a range of VMs is presented in Fig. 9. It shows the results of makespan using the Montage workflow with different workloads. The makespan is the total running time of a resource during workflow scheduling. In comparing the performance of the three algorithms, the proposed MOWOS algorithm is efficient in distributing workflow tasks on VMs in all the four different workloads (25, 50, 100, and 1000). The MOWOS algorithm, therefore, achieved a significant reduction in makespan as compared to HSLJF and SECURE algorithms. This is because MOWOS algorithm is able to identify and map the large workflow tasks onto MaxVMs and the small workflow tasks onto MinVMs to avoid overloading, since overloaded VM can slow down the execution process of workflow tasks. For example, in a workload of 25 workflow tasks, the proposed MOWOS took less than a minute to execute the 25 workflow tasks on VMs, while HSLJF and SECRE algorithms took more than a minute to execute the same workload.
Similarly, the Fig. 10 presents the results of makespan for CyberShake workflow, benchmarked under different workloads or task sizes (30, 50, 100, and 1000). It shows that the three algorithms are closely doing better in terms of execution makespan. It is however notices that, in terms of task execution time, the proposed MOWOS algorithm takes less time to execute workflows in all the four workloads, followed by the HSLJF algorithm and lastly the SECURE algorithm.
The results generally showed that the proposed algorithm consistently achieves lesser makespan values in both SIPHT and Inspiral workflows for all the workloads (Figs. 11 and 12). These results clearly demonstrate that, the proposed MOWOS algorithm generally generates the best workflows schedules than HSLJF and SECURE algorithms. The HSLJF algorithm also uses less time to execute tasks on VMs, as compared to SECURE algorithm. Workflow tasks are better performed in MOWOS algorithm because, it is able to split large tasks that allow every task to meet their deadlines.
Overall, the proposed MOWOS algorithm produces better makespan over HSLJF and SECURE algorithms. The method is therefore more effective when the range of workloads is increasing. So the more large tasks are split into sub-tasks, the higher the efficiency of workflow task execution, hence the resultant reduction of the execution makespan. However, by increasing the number of workflows tasks (from 100 to 1000), the execution time (makespan) increases, thus a reduction in the performance of the algorithms [69]. It is obvious from all the four workflows that, the three algorithms (MOWOS, HSLJF and SECURE) performs better with a lesser number of workflow tasks (from 25 to 50) as compared to the large and the extra-large workflows tasks (from 100 to 1000). Resource Utilization: This refers to the practice of making the best use of cloud resources, by keeping the resources busy at all the stages of workflow execution. This is beneficial to the cloud service provider, because, providers maximize profit when resources are fully utilized. Any unused time slot of leased resources, is a cost to the provider, hence the need to ensure efficient use of cloud resources, to reduce cost and make profit. The utilization rate of cloud resources, by each of the three algorithms, were evaluated as in Figs. 13, 14, 15 and 16.
The figures illustrate the rate of resource utilization obtained by MOWOS, HSLJF, and SECURE algorithms. It is shown from the figures that for small workflows, the proposed MOWOS algorithm has a better performance than the existing HSLJF and SECURE algorithms by 10% and 13% respectively. In the medium workflows, the respective percentage improvement in the proposed MOWOS over HSLJF and SECURE is 12% and 13%. For the large workflows, the MOWOS algorithm has utilized the resources better than the existing HSLJF algorithm by 14% and SECURE algorithms by 24%. Lastly, for the extra-large workflows, the utilization improvement of the proposed MOWOS over the state-of-art-algorithms such as HSLJF is 18% and SECURE is 26%.
The overall trend of resource utilization rate is higher when large and extra-large workflow tasks are used. Conversely, the rate of increase in utilization falls when small and meduim workflows are scheduled. This is because any increase in the number of workflow tasks will lead to increased computation performed by each algorithm. It is conclusive from the results analysis that, the proposed MOWOS algorithm has less execution cost, better execution makespan and utilizes the resources better than the existing HSLJF and SECURE algorithms.

Conclusion and future work
In this research, we investigated workflow scheduling problems in the cloud computing environment and presented a novel heuristic approach called Multi-Objective Workflow Optimization Strategy (MOWOS). The proposed algorithm aims to optimize execution cost, execution makespan and resource utilization, while allowing workflow tasks to complete before deadlines. The proposed algorithm consists of three sub-algorithms: a tasks splitting algorithm, a MaxVM selection algorithm, and a MinVM selection algorithm. The proposed method has been implemented and tested in the WorkflowSim simulator. We have compared the performance of the proposed algorithm with HSLJF and SECURE algorithms based on four well-known scientific workflows, including Montage, CyberShake, SIPHT, and LIGO Inspiral. The simulation results have shown that, the proposed MOWOS algorithm has less execution cost, better execution makespan and utilizes the resources better than the existing HSLJF and SECURE algorithms. The overall performance of the proposed algorithm increases significantly when compared to HSLJF and SECURE algorithms for large and extra-large workflow tasks while maintaining a slight improvement for small and medium workflow tasks.
In this paper, execution cost, execution makespan and resource utilization are considered as the only two optimization objectives. We will extend our work to consider energy consumption and load balancing in our future research, and also will provide more evidence to reasoning the assumptions made in this research.