 Research
 Open Access
 Published:
Dataintensive workflow scheduling strategy based on deep reinforcement learning in multiclouds
Journal of Cloud Computing volumeÂ 12, ArticleÂ number:Â 125 (2023)
Abstract
With the increase development of Internet of Things devices, the dataintensive workflow has emerged as a new kinds of representation for IoT applications. Because most IoT systems are structured in multiclouds environment and the dataintensive workflow has the characteristics of scattered data sources and distributed execution requirements at the cloud center and edge clouds, it brings many challenges to the scheduling of such workflow, such as data flow control management, data transmission scheduling, etc. Aiming at the execution constraints of business and technology and data transmission optimization of dataintensive workflow, a dataintensive workflow scheduling method based on deep reinforcement learning in multiclouds is proposed. First, the execution constraints, edge node load and data transmission volume of IoT data workflow are modeled; then the dataintensive workflow is segmented with the consideration of business constraints and the first optimization goal of data transmission; besides, taking the workflow execution time and average load balancing as the secondary optimization goal, the improved DQN algorithm is used to schedule the workflow. Based on the DQN algorithm, the model reward function and action selection are redesigned and improved. The simulation results based on WorkflowSim show that, compared with MOPSO, NSGAII, GTBGA and DQN, the algorithm proposed in this paper can effectively reduce the execution time of IoT data workflow under the condition of ensuring the execution constraints and load balancing of multiclouds.
Introduction
With the rapid development of Internet of Things (IoT) technology, the data that collected by various IoT devices are constantly generated and accumulated. Then the dataintensive applications are produced. However, the IoT devices have difficulty processing dataintensive applications locally due to the limited resources. It is difficult to provide all resources and schedule services for IoT users only by relying on a single cloud.
The IoT architecture composed of the center cloud and the edge clouds can be seen as multiclouds [1]. For the data generated by the edge devices of the Internet of Things, there are often corresponding storage devices in the edge cloud to meet the rapid response required by edge computing. However, due to the limited computing resources of the edge cloud, the center cloud is also needed as an important part to meet the processing requirements of massive data. The multiclouds are composed of the central cloud and the edge clouds, which complete multiple dataintensive workflow scheduling tasks. Different from the traditional workflow, the dataintensive workflow in the Internet of Things environment has the characteristics of scattered data sources, large data scale and distributed execution at the multiclouds.
When executing this kind of workflow in multiclouds environment, many factors such as business constraints about data privacy and longdistance data transmission should be considered. There are many challenges in the data flow control management and the data transmission scheduling.
The existing methods to solve dataintensive workflow scheduling mainly include cloud computing scheduling strategy based on heuristic thought, cloud computing scheduling strategy based on segmentation thought and cloud computing scheduling strategy based on reinforcement learning [2]. The heterogeneous distributed resource environment of cloud computing and the parallel task structure of dataintensive workflow together form a large state space. The reinforcement learning has powerful decisionmaking ability when dealing with the complex space problem. Therefore, reinforcement learning is often used as a powerful means to solve scheduling problems in recent years [3].
However, applying reinforcement learning to solve the scheduling problem of dataintensive scientific workflow in cloud computing environment has the following difficulties:
On the one hand, the time and the cost of dataintensive workflow mainly come from the link transmission process. The way to reduce the link loss is to reduce the data dependence among data centers, which usually requires the segmentation of workflow structure. On the other hand, the state set of workflow scheduling is complex, and there is a problem of overdimensionality. It is difficult to store all reward values in the form of a table. It is solved by generalization of state vectors by neural network, and the state values are extracted by deep reinforcement learning (DRL) technology, so as to achieve the goal of dimensionality reduction.
Therefore, this paper proposes a dataintensive workflow scheduling method based on deep reinforcement learning in multiclouds. The contributions of this paper are summarized as follows:

(1)
In order to deal with the scheduling problem of dataintensive workflow, this paper proposes a method of workflow segmentation to reduce the data transmission between partitions which will be deployed in cloud center and edge cloud. By dividing the original workflow into several blocks with similar scale and low data dependence, the algorithm provides a certain environmental state model foundation for the deep reinforcement learning scheduling algorithm in the subsequent chapters.

(2)
In this paper, deep neural network is introduced into reinforcement learning, and it is used to train reinforcement learning process. Based on the DQN algorithm, the idea of bias correction is introduced to calculate the variance of the current state reward to solve the problem of overestimation of Q value. In addition, the reward function is improved so that the workflow scheduling results converge to a stable correlated equilibrium policy.

(3)
Finally, the open source WorkflowSim simulation environment is used to evaluate the proposed method. Compared with the traditional workflow scheduling method, the experimental results show that the proposed method can effectively improve the workflow execution time and load balancing.
The rest of this paper is organized as follows. The second section introduces the related work of this paper. In the third section, the related definitions involved in scheduling method and the segmentation method of dataintensive workflow are given. In the fourth section, the detailed method of workflow scheduling strategy is designed. Then, in the fifth section, the effectiveness of this method is verified by experiments based on WordflowSim simulation environment. Finally, the sixth section summarizes the full text.
Related work
The algorithm based on heuristic idea can efficiently find the approximate optimal solution of the workflow scheduling problem in cloud computing, and it is the mainstream type of cloud computing scheduling method in recent years. The basic prototypes are ant colony optimization (ACO), particle swarm optimization (PSO), genetic algorithm (GA), etc. [4]. Literature [5] puts forward the earliest completion time algorithm (HEFT), which is a constructive heuristic algorithm. The algorithm first sets the priority of each task in the workflow based on the average execution cost and the average transmission cost. Then allocates resources for the tasks according to the task priority and the earliest completion time of the task on the virtual machine. Literature [6] puts forward the big cuckoo algorithm, which imitates cuckooâ€™s sojourning behavior. It aims at minimizing the turnaround time and maximizing the resource utilization rate. However, this algorithm fails to take into account the interaction of big data and is not suitable for dataintensive workflow. Literature [7] puts forward a multiobjective artificial bee colony algorithm, which is a swarm intelligence algorithm that can reduce energy consumption, execution time and cost respectively and improve resource utilization, but it does not discuss mutually exclusive performance indicators, such as execution time and overall cost, and provides a compromise solution. Literature [8] puts forward a hybrid particle swarm optimization (PSO) HEFT algorithm, which focuses on solving the problem of high energy consumption in the process of workflow scheduling in cloud computing system, and it can obtain a scheduling solution that balances the scheduling quality and energy consumption, but this algorithm is not suitable for dealing with scientific workflow scheduling problems oriented to data flow. This kind of algorithm can find the feasible solution under the constraint conditions, but because it canâ€™t predict the deviation between the feasible solution and the optimal solution, the convergence speed is slow, and it often falls into the local optimal solution in the process of solving, so it is difficult to meet the task requirements of low latency.
When the workflow brings a heavy burden to the data link, researchers usually use graph segmentation to minimize the data traffic between the blocks, so as to reduce the data coupling within the secondary workflow, thus reducing the link load between data centers [9]. There are two important principles for segmentation of workflow flow graph on cloud computing platform, one is to make the data dependency between segmented subgraphs as small as possible, which can give full play to the advantages of distributed parallel computing of cloud computing; The second is to make the scale of each block as balanced as possible, which can avoid the shortboard effect of workflow and improve the system performance. Literature [10,11,12] offload computingintensive tasks to the edge server or cloud for processing, which maximizes the quality of user experience under resource constraints. Literature [13] use cuckoo search (CS) to segment the workflow, and finally the decision tree is used to allocate resources. Although this method can accelerate the iterative convergence and shorten the execution time, the selected fitness function canâ€™t describe the segmentation result well.
In recent years, reinforcement learning is often used as a powerful means to solve scheduling problems. By using the excellent decisionmaking ability of reinforcement learning to solve scheduling problems in complex edge environments, the convergence speed can be accelerated by constantly correcting the deviation of feasible solutions and better solutions. Literature [14] uses Qlearning algorithm to match resources in online dynamic scheduling. This method is oriented to unrelated tasks and it can obviously shorten the average response time of tasks. However, it is not suitable for workflow problems with priority or dependence, and it is difficult to predict and classify the upcoming tasks. Literature [15] uses reinforcement learning method to optimize the scheduling of memory controller, which improves the running state of application and bandwidth utilization. Finally, cerebellar neural network is used to reduce the dimension of state space, but it is not suitable for dataintensive workflow cloud scheduling. In literature [16], in order to improve the task processing efficiency for Internet of Vehicles (IoV),the paper design a CORA algorithm and use the Markov decision process model for formulating the dynamic optimization problem. In literature [17], the author developed a scheduling algorithm based on pointer network and reinforcement learning method. In the state set of the algorithm, parameters such as execution time, virtual machine failure probability, communication cost and associated tasks were defined, and a state neural network based on these parameters was analyzed and constructed. Literature [18] uses the gamebased method to offload the computingintensive tasks to achieve the goal of minimizing user costs and maximizing server profits.
Due to the limitations of reinforcement learning itself, it canâ€™t deal with the problem of high maintenance and continuity [19]. Deep learning method focuses on the expression of perception and input, and it is good at discovering the characteristics of data. Because deep learning can make up for the shortcomings of reinforcement learning, deep reinforcement learning (DRL) uses deep neural networkâ€™s ability to capture environmental characteristics and the decisionmaking ability of RL can solve complex system control problems, and it can use edge nodes as intelligent agents to learn scheduling strategies without global information about the environment. Aiming at the data transmission overhead of dataintensive workflow, as well as the optimization objectives of workflow scheduling, such as execution time and load balance, etc., this paper studies the workflow scheduling algorithm based on deep reinforcement learning.
Dataintensive workflow segmentation method for MultiClouds execute scheduling and related definitions
When dealing with dataintensive workflow, because the data link needs to transmit a large amount of data, the overall consumption mostly comes from the cost of data transmission [20]. In addition, the deployment positions of cloudedge nodes are scattered in the edge cloud, and the dependencies among tasks of workflow are complex. In order to deal with the scheduling problem of dataintensive workflow, this paper proposes a method of workflow segmentation to reduce the data transmission between partitions. Through the partition scheduling method based on constraints, the goal of minimizing the execution time of dataintensive workflow is achieved.
Related definitions
Data transmission between tasks
In this paper, workflow is represented based on directed acyclic graph (DAG), in which nodes are composed of tasks and edges represent their dependencies. The workflow before segmentation is recorded as ODAG, as shown in Fig. 1:
ODAGâ€‰=â€‰\(\left( {\text{T,E}} \right)\), where t is the set of task vertices, and \({\text{T}} = \{ T_{1} ,T_{2} ,...,T_{n} \}\) indicates that cloud workflow consists of n dependent tasks; E is a set of directed edges.
\(\mathrm E=\left\{\left({\mathrm T}_i,{\mathrm T}_j,{\mathrm{Data}}_{ij}\right):\left({\mathrm T}_i,{\mathrm T}_j\right)\in\mathrm T\right\}\),\({\text{(T}}_{i} {\text{,T}}_{j} {\text{,Data}}_{ij} {)}\) indicates the dependency between tasks. \({\text{T}}_{{\text{i}}}\) is the predecessor of \({\text{T}}_{{\text{j}}}\), \({\text{T}}_{{\text{j}}}\) is the successor of \({\text{T}}_{{\text{i}}}\), and \({\text{Data}}_{ij}\) indicates the amount of data transferred from \({\text{T}}_{{\text{i}}}\) to \({\text{T}}_{{\text{j}}}\).
Data transmission between subworkflows
After the dataintensive workflow is divided, subworkflows are formed, which are also described by directed acyclic graph, which is called SDAG. Each subworkflow is regarded as a vertex, and the dependency among subworkflows is regarded as an edge.
Thus simplifying ODAG, as shown in Fig. 2. Write it as \({\text{OG = (P,R)}}\), where \({\text{P}}\) is the set of subworkflows, and \(P = \{ P_{1} ,P_{2} ,...,P_{m} \}\) indicates that the original work consists of m subworkflows; \({\text{R}}\) is the set of edges representing the dependency among subworkflows. \(R = \{ (P_{i} ,P_{j} ,Data_{ij} ):(P_{i} ,P_{j} ) \in P\}\). \((P_{i} ,P_{j} ,Data_{ij} )\) represents the dependency relationship among subworkflows, and \(P_{i}\) is the predecessor of \(P_{j}\), which is the successor of \(P_{j}\). \(Data_{ij}\) represents the amount of data that the \(P_{i}\) passes to the the \(P_{j}\).
Execution time
The dataintensive execution time before segmentation depends on the amount of data transmission and the computing power of node resources between its subworkflows Each subprocess will be assigned to a virtual machine in a cloud center or edge node, and each subprocess has a start time and an end time, which are marked as \({\text{sched}}\left( {P_{i} } \right) = (vm_{j} ,SP_{{P_{i} }} ,EP_{{P_{i} }} )\), \({\text{vm}}_{j}\) indicate that subworkflow \(P_{i}\) is executed on virtual machine of \(vm_{j}\). \(SP_{{P{}_{i}}}\) is the start time of \({\text{P}}_{{\text{i}}}\), \(EP_{{P_{i} }}\) is the end time of \({\text{P}}_{{\text{i}}}\). \(SP_{{P{}_{i}}}\) is determined by the time when all predecessor subworkflows of \({\text{P}}_{{\text{i}}}\) are executed and data is transmitted to the virtual machine where subworkflow \({\text{P}}_{j}\) is located. \({\text{E}}P_{{P{}_{i}}}\) is the execution time of \({\text{P}}_{{\text{i}}}\) and plus its start time \(SP_{{P{}_{i}}}\), the formula is as follows:
\(EP_{{P_{h} }}\) is the end time of the precursor workflow \({\text{P}}_{{\text{h}}}\), \(\frac{{Data(P_{hi} )}}{b}\) is the time when data is transmitted from \({\text{P}}_{{\text{h}}}\) to \({\text{P}}_{i}\). b is the data transmission rate between nodes. \(work(P_{i} )\) is the instruction number of workflow \(P_{i}\). \(P_{{vm_{j} }}\) is the number of instructions processed per second for \(vm_{j}\). The time period from the first subworkflow to the last subworkflow is called the completion time of the whole IoT data workflow, so the completion time of the whole IoT data workflow is as follows:
Node load balance degree
Cloud and edge nodes, as the execution carriers of dataintensive workflow, must ensure their stable and normal operation, and node load balancing can effectively guarantee the normal and stable operation of the whole system [21]. Therefore, it is of great significance to ensure the load balancing of each node in the scheduling process for the stability of workflow execution. The load balancing is shown in Fig. 3. \(n_{i}\) represents the computing node, and \(P_{i}\) represents the subworkflow.
Compute resource node j (jâ€‰=â€‰1,2,â€¦,K; K is the total number of resource nodes) to represent the frequency of the resource nodes executing subworkflows, which is denoted by \(N_{j}\). Assuming that each virtual machine in each compute node has the same computing power, the calculation formula of resource utilization of a single compute node is:
Here, it is expressed by the degree of equilibrium. In the process of optimization, the value of r needs to be as close to 1 as possible. In addition, in this paper, the average load of multiple computing nodes (cloud center and edge nodes), \(\frac{{\text{r}}}{N}\) is used as the optimization objective (n is the number of nodes).
Dataintensive workflow segmentation algorithm
Business constraints
Business constraint refers to the data privacy protection constraint, that is, the data source or running scenario of the constrained task is fixed and must be executed on one or more specified nodes [22]. In this paper, based on business bundle, dataintensive workflow segmentation optimization and scheduling optimization are carried out. The locations in the scheduling range are onehot encoding, and each geographical location is a string of length N, where N is the number of all locations, and only one number in the string is 1, and all other numbers are 0, as shown in Fig. 4, \(n_{i}\) represents the computing node, \(P_{i}\) represents the subworkflow, and \(D_{i}\) is the corresponding dependent data. In this representation, if the position of the required data determines the position of the computing task, the following formula must be satisfied:
Where \({\text{location}}_{data}\) is the position vector of the constrained data set and \({\text{location}}_{task}\) is the position vector corresponding to the data processing task.
Segmentation algorithm
Business constraints are the premise of segmentation location optimization. When the tasks of \({\text{location}}_{task}\) are equal, these tasks belong to the same set of constraints. On the basis of constraints, the following segmentation optimization algorithm is given, and the business constraint set is further optimized according to the data transmission volume to obtain a new task set.
The algorithm steps are as follows:

(1)
Incorporate unconstrained tasks related to constrained tasks into constrained task sets: take the average value of internal data transmission in each constrained task set as a threshold value w of data transmission, and traverse unconstrained tasks in ODAG from tasks in constrained task sets. If the data transmission between two adjacent tasks is greater than the threshold value w, merge subsequent tasks into the constrained task set, and continue to traverse. If it is less than w, the traversal is finished. Traverse from the next constraint task set and repeat the above steps.

(2)
The unconnected related unconstrained tasks form a new task set: traverse from the start node of ODAG, and form a new task set with continuous unconstrained tasks.

(3)
Further optimizing the division position in the task set: if the task set is allowed to be scheduled to multiple service nodes, the set can be further divided into multiple sets according to the threshold W.
Dataintensive workflow scheduling strategy
This paper presents a workflow scheduling method based on DQN algorithm. On the basis of DQN algorithm, the model reward function is redesigned and improved according to the characteristics of the research problem.
Multiobjective optimal scheduling
In order to promote the development of multiobjective optimal scheduling method based on deep reinforcement learning, this paper puts forward the following assumptions [23]:

â‘ Each task can only be performed by one cloud host;

â‘¡ The running time of the task is the time interval between the start and end of the task;

â‘¢ The delay time of resource supply or cancellation is not considered;

â‘£ Do not consider the delay time of transmission between tasks.
In this paper, two QoS indexes, namely the makespan of workflow and load balance, are considered. As the goal of cloud workflow scheduling, it is a biobjective optimization problem. The goal of scheduling optimization algorithm can be expressed as follows:
Among them, \(Tw_{total}\) is the execution time of IoT data workflow and \(Sl_{total}\) is the average load of the system; \(S(p_{i} )\) is the constraint condition of the algorithm, and \(p_{i}\) is the subworkflow with business constraints and \(dc_{p}\) is the service node assigned,\(r_{m}\) is the resources required by subworkflow \(p_{i}\) on service node \(dc_{p}\).
These two optimization objectives are abstracted into two agents respectively, each agent is an agent based on DQN algorithm, and carries out adaptive learning and selfoptimization process through interaction with the environment and other agents.
Workflow scheduling method based on improved DQN
DQN
DQN algorithm is a popular method in the field of deep reinforcement learning. Its main modules include: environment module, loss function, experience replay module and two neural networks with the same structure but different parameters, namely estimated value network and target value network [24].
The DQN algorithm learns the action value function Q* corresponding to the optimal strategy by minimizing the loss,
Among them, \(y\) is the objective Q function, and its parameters are updated periodically with the latest ones \(\theta\), which is helpful for stable learning.
DQN uses Qtable to store the Q values of each stateaction pair, and DQN uses neural network to extract complex features and analyze them to generate Q values [25]. The estimated Q value network is used to predict the estimated Q value. Its input comes from the latest parameters of the current environment, and the parameters will be updated every iteration. \(\theta\) is weight, \(Q^{*} (s,a\theta )\) is used to represent the output of the current estimated value network. The input parameters of the target Q value network are updated every once in a while. \(\mathop {\max }\limits_{{a{\prime} }} (Q^{*} (s{\prime} ,a{\prime} \theta^{  } ))\) indicates the output of the target value network. The training goal of the neural network is to optimize the loss function constructed by these two Q values, and then update the parameters of the estimated Q value network by using the method of random gradient descent through back propagation. Every certain number of iterations, the parameters of the estimated Q value network will be copied to the target Q value network regularly. To some extent, it reduces the correlation between the estimated Q value and the target Q value, making divergence or oscillation more impossible, thus improving the stability of the algorithm [26, 27].
The use of neural network module in DQN overcomes the highdimensional data disaster of singleagent reinforcement learning. and balances the contradiction between exploration and utilization to some extent through the use of target value network, experience playback pool and exploration mechanism based on \(\varepsilon {\rm O}\) method.
DQNRL
In this paper, we propose a DQN method based on bias correction. Qvalues are obtained from multiple historical online value network models and online value network outputs. The variance of these multiple current state rewards is calculated, and then the bias correction term is calculated based on the variance and applied to the target Qvalue formula, which solves the problem of Qvalue overestimation to some extent.
The formula for calculating the target Q value in DQN is shown in the following equation:
The improved Qvalue calculation formula:
In the above equation, \(r_{t}\) shows the estimate of the tth immediate reward. \(\theta\) shows the parameters of the saved historical online value network model. \(a\) shows the saved actions in the empirical data. \(B(s,a,r)\) is the modified bias correction term.
In multiagent learning systems, it is usually faced with the challenges of difficult determination of learning objectives, unstable learning problems and coordination of processing. In this paper, the reward function is improved so that the workflow scheduling results converge to a stable correlated equilibrium policy.

(1)
State Space
The state space set in this paper is represented by a vector \(Vector = [s_{1} ,s_{2} ,...,s{}_{i},...,s_{n} ]\) where n is the number of tasks in the workflow, the index in the vector represents the \(ID_{{t_{id} }}\) of each task, \(s_{i}\) is is an integer represents the state of the ith task, and 1 represents that the task has been executed. 2 means that the task can be executed; 3 means that the predecessor node of the task has not been executed, that is, the execution conditions are not met;0â€‰~â€‰m represents being executed by the virtual machine,and the value is the id of virtual machine [28].

(2)
Reward function
A suitable reward function design can ensure the stability and convergence of the algorithm in multiagent learning scenarios. For the agent with maximum completion time, the reward function designed in this paper is as follows,
The reward function of load balancing is as follows,
The value ranges of the reward function of execution time \({\text{w}}_{1}\) and the reward function of load balancing \({\text{w}}_{2}\) both fall within [0,1], which means that the execution time is updated to make the value of the increased execution time as small as possible, and the corresponding reward value is closer to 1; Otherwise, it approaches 0. Similarly, the second formula represents that the smaller the added value of load balancing is, the more desirable its strategy is, and the closer its reward value is to 1; Otherwise, there is no reward and the value is 0.

(3)
Action selection
In the process of reinforcement learning, the action with the largest Q value will be selected every time, which is to use greedy strategy to perform action selection [29, 30]. However, in the initial stage of reinforcement learning, agents canâ€™t master the Q value, so they need to explore and choose unknown actions in a random way. After a period of learning, they can get a certain amount of Q value. However, at this time, whether to continue to explore unknown actions or make use of the current action with the largest Q value is the balance problem of exploration and utilization faced by reinforcement learning. In order to solve this problem, this paper uses the variable \(\varepsilon {\rm O}\) strategy, that is, at the beginning, s is set to be larger, such as 0.9, to give the model more opportunities to explore; With the increase of training rounds, the learning ability of the model becomes stronger, and the updated state action value becomes better. The value of s is gradually reduced, and the learned Q value is more used to choose the best behavior.
Simulation analysis
Experimental data set
In this paper, CyberShake is used as the workflow for experiments. CyberShake workflow is usually used to process seismic data, and was originally used by Southern California Earthquake Center [31]. As shown in Fig. 5. Cybershake is a computationintensive and dataintensive workflow. The number of task nodes can be different, and it can also handle large data sets, which is very suitable for verifying the effectiveness of the algorithm proposed in this paper. And each color represents a different workflow.
Experimental environment and parameter setting
Experimental environment
Through the effective integration of workflow simulation platform WorkflowSim and cloud computing environment simulation platform CloudSim, the scheduling simulation of workflow in edge cloud is carried out, and the software development environment is JDK1.7.0; Hardware development environment is inter (r) core (TM) i75600 CPU @ 2.60ghz, and memory is 16 GB.
Data center and virtual machine configuration in WorkflowSim
The experiment was conducted in five data centers in different geographical locations, with the first four data centers representing four edge computing nodes and the remaining one representing the cloud computing center. Each data center in the first four data centers is equipped with one host, and the fifth data center is equipped with three hosts, as shown in Table 1. The parameter settings of edge nodes and cloud computing centers are shown in Table 2. At the same time, three types of virtual machines, 10 in each type, were randomly allocated to five data centers (Table 3).
Parameter setting
Training parameter settings are shown in Table 4.
Comparison method
In this paper, MOPSO [32], NSGAII [33] GTBGA [34] and DQN [35] are used as benchmark comparison algorithms.

(1)
The GTBGA combines game theory and greedy strategy. Firstly, the tasks of different scientific workflows are divided into task packages that can be executed at different stages. The game between phased assignable tasks and available cloud hosts is balanced and matched. Because there are multiple stages, and the scheduling process only considers the optimization of a single stage, it is greedy.

(2)
The MOPSO is a multiobjective optimization algorithm based on heuristics, combined with Pareto optimization technology. On the basis of PSO algorithm, MOPSO added an external Archive and combined with special mutation operations to find the Pareto optimal solution set.

(3)
NSGAII is a multiobjective optimization algorithm based on heuristics. NSGAII is an improved algorithm based on NSGA, which ensures the diversity of solutions by using elite selection strategy and crowding distance and does not have to rely on shared parameters.

(4)
The DQN method combines Q Learning with deep learning, uses the deep network to approximate the action value function. And the DQN uses the experience playback mechanism and the target network to stabilize the training process.
Experimental results
To evaluate the effectiveness of the scheduling algorithm proposed in this paper, it is compared with MOPSO, NSGAII, GTBGA and DQN. Then, the execution time of workflow and load balance of nodes are taken as optimization objectives. Comparative experimental results are shown in Figs. 6 and 7.
The comparison between this method and the baseline method in workflow execution time is shown in Fig. 6. The abscissa is the number of tasks. The ordinate is the execution time,and the unit is seconds.
As can be seen from Fig. 6, compared with DQN algorithm, the makespan of workflow tasks deployed by DQNDL algorithm saves 21% on average, 40% on average compared with MOPSO algorithm, 50% on average compared with NSGAII algorithm and 57% on average compared with GTBGA algorithm. It can be seen from the graph that the method proposed in this paper has the shortest execution time, with the shortest execution time of 100, and the best effect.
The comparison between this method and the baseline method in load balancing is shown in Fig. 7. The abscissa is the number of tasks, the ordinate is the degree of load balance, and the unit is rate. As can be seen from Fig. 7, compared with DQN algorithm, the load balance of workflow tasks deployed by DQNDL algorithm has increased by 5% on average, by 11% compared with MOPSO algorithm, by 24% compared with NSGAII algorithm and by 27% compared with GTBGA algorithm. To sum up, in the aspect of server load balancing of workflow tasks, the DQNDL algorithm is better than the other four algorithms, thus better ensuring the stability of computing nodes.
In a word, the scheduling scheme given by the DQNRL algorithm proposed in this paper is far superior to other comparison algorithms in terms of the execution time of workflow and the load balance of nodes. All of them have such good performance, precisely because of the excellent performance of deep reinforcement learning in this largescale decisionmaking problem, and its decisionmaking ability and the ability to find the optimal solution are far superior to the traditional heuristic algorithm.
Conclusion
In the cloudedge collaborative environment, IoT data workflow has a large amount of data and scattered data sources, so the data dependence among tasks of IoT data workflow is complex, and data transmission is inevitable during scheduling. This paper adopts the method based on deep reinforcement learning to optimize the multiobjective scheduling of dataintensive workflows, first divides the dataintensive workflows, and then uses the improved DQN algorithm to schedule multiple workflows. Through the experimental evaluation, this method can effectively optimize the execution time of data workflow adjustment, effectively improve the service quality, and make the average load of each node more balanced, making the system work more stable.
Availability of data and materials
The data used during the current study are available from the corresponding author on reasonable request.
Abbreviations
 DQN:

DeepQnetwork
 ACO:

Ant colony optimization
 PSO:

Particle swarm optimization
 GA:

Genetic algorithm
References
Huang J, Gao H, Wan S et al (2023) AoIaware energy control and computation offloading for industrial IoT. Futur Gener Comput Syst 139:29â€“37
Huang J, Zhang C, Zhang J (2020) A multiqueue approach of energy efficient task scheduling for sensor hubs. Chin J Electron 29(2):242â€“247
Shyalika C, Silva T, Karunananda A (2020) Reinforcement learning in dynamic task scheduling: a review. SN Comput Sci 1:1â€“17
Masdari M, ValiKardan S, Shahi Z et al (2016) Towards workflow scheduling in cloud computing: a comprehensive analysis. J Netw Comput Appl 66:64â€“82
Dubey K, Kumar M, Sharma SC (2018) Modified HEFT algorithm for task scheduling in cloud environment. Procedia Comput Sci 125:725â€“732
Navimipour NJ, Milani FS (2015) Task scheduling in the cloud computing based on the cuckoo search algorithm. Int J Model Optim 5:44â€“47
Gao KZ, Suganthan PN, Pan QK, Chua TJ, Chong CS, Cai TX (2016) An improved artificial bee colony algorithm for flexible jobshop scheduling problem with fuzzy processing time. Expert Syst Appl 65:52â€“67
Wang F, Zhang H, Li K et al (2018) A hybrid particle swarm optimization algorithm using adaptive learning strategy. Inf Sci 436:162â€“177
Pei S, Zhang Q, Cheng X (2020) Workflow scheduling using graph segmentation and reinforcement learning. Int J Perform Eng 16(8)
Chen Y, Zhao J, Wu Y, Huang Y, Shen XS (2022) QoEaware decentralized task Offloading and resource allocation for endedgecloud systems: a gametheoretical approach. IEEE Trans Mob Comput. https://ieeexplore.ieee.org/document/9954914
CHEN Ying, HU Jintao, ZHAO Jie, et al (2023) QoSAware Computation offloading in LEO satellite edge computing for IoT: a gametheoretical approach. Chin J Electron. https://cje.ejournal.org.cn/article/doi/10.23919/cje.2022.00.412
Ying Chen, Jie Zhao, Xiaokang Zhou, Lianyong Qi, Xiaolong Xu, Jiwei Huang (2023) A distributed game theoretical approach for credibilityguaranteed multimedia data offloading in MEC. Inf Sci 644:0020â€“0255
Alawad NA, Abedalguni BH (2021) Discrete islandbased cuckoo search with highly disruptive polynomial mutation and oppositionbased learning strategy for scheduling of workflow applications in cloud environments. Arab J Sci Eng 46(4):3213â€“3233
Kaur A, Singh P, Singh Batth R et al (2022) DeepQ learningbased heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud. Softw Pract Exp 52(3):689â€“709
Chen Y, WG U, Xu J, Zhang Y, Min G (2023) Dynamic task offloading for digital twinempowered mobile edge computing via deep reinforcement learning. Chin Commun 1â€“12. https://ieeexplore.ieee.org/abstract/document/10122834
Huang J, Wan J, Lv B, Ye Q, Chen Y (2023) Joint computation offloading and resource allocation for edgecloud collaboration in internet of vehicles via Deep reinforcement learning. IEEE Syst J 17(2):2500â€“2511
ALTam F, Mazayev A, Correia N, Rodriguez J (2020) Radio resource scheduling with deep pointer networks and reinforcement Learning. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design ofCommunication Links and Networks (CAMAD). IEEE, Pisa, Italy, p. 16
Ying Chen, Jie Zhao, Jintao Hu, Shaohua Wan, Jiwei Huang (2023) Distributed task offloading and resource purchasing in NOMAenabled mobile edge computing: hierarchical game theoretical approaches. ACM Trans Embed Comput Syst 1539â€“9087. https://dl.acm.org/doi/abs/10.1145/3597023
Ling N, Wang K, He Y, et al (2021) Rtmdl: supporting realtime mixed deep learning tasks on edge platforms. In: Proceedings of the 19th ACM conference on embedded networked sensor systems. pp 1â€“14
Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evol Comput 8(3):256â€“279
Ghomi EJ, Rahmani AM, Qader NN (2017) Loadbalancing algorithms in cloud computing: a survey. J Netw Comput Appl 88:50â€“71
Monakova G, Leymann F (2013) Workflow ART: a framework for multidimensional workflow analysis. Enterp Inf Syst 7(1):133â€“166
Quan Z, Wang Y, Ji Z (2022) Multiobjective optimization scheduling for manufacturing process based on virtual workflow models. Appl Soft Comput 122:108786
Wang Y, Liu H, Zheng W et al (2019) Multiobjective workflow scheduling with deepQnetworkbased multiagent reinforcement learning. IEEE Access 7:39974â€“39982
Liu H, Ma Y, Chen P, et al (2020) Scheduling multiworkflows overedge computing resources with timevarying performance, A novel probabilitymass function and DQNbased approach. In: Web Servicesâ€“ICWS 2020: 27th International Conference,Springer, Cham, Honolulu, p. 197â€“209
Wang Y, Jiang J, Xia Y, et al (2018) A multistage dynamic gametheoretic approach for multiworkflow scheduling on heterogeneous virtual machines from multiple infrastructureasaservice clouds. In: International conference on services computing (SCC). Springer, Zhuhai, pp 137â€“152
Tong Z, Chen H, Deng X et al (2020) A scheduling scheme in the cloud computing environment using deep Qlearning. Inf Sci 512:1170â€“1191
Ã‡atal O, Wauthier S, De Boom C et al (2020) Learning generative state space models for active inference. Front Comput Neurosci 14:574372
Huang L, Bi S, Zhang YJ (2020) Deep reinforcement learning for online computation offloading in wireless powered mobileedge computing networks. IEEE Trans Mobile Comput 19(11):2581â€“2593
Meng F, Chen P, Wu L (2019) Power allocation in multiuser cellular networks with deep Q learning approach. In: Proc. IEEE Int. Conf. Commun. pp 1â€“6
Jain A, Kumari R (2017) A review on comparison of workflow scheduling algorithms with scientific workflows. In:Proceedings of International Conference on Communication and Networks. vol 508. Springer, Singapore, p. 613â€“622. https://link.springer.com/chapter/10.1007/9789811027505_63
Rehman A, Hussain SS, urRehman Z et al (2019) Multiobjective approach of energy efficient workflow scheduling in cloud environments. Concurr Comput Pract Exp 31(8):e4949
Li H, Wang B, Yuan Y et al (2021) Scoring and dynamic hierarchybased NSGAII for multiobjective workflow scheduling in the cloud. IEEE Trans Autom Sci Eng 19(2):982â€“993
Dong T, Xue F, Xiao C, Zhang J (2021) Deep reinforcement learning for dynamic workflow scheduling in cloud environment. 2021 IEEE International Conference on Services Computing (SCC), Chicago, IL, USA. p. 107â€“115
Huo D, Wu H, Wang B, et al (2022) A DQNbased workflow task assignment approach in cloudfog cooperative considering terminal mobility. In: The 6th International Conference on Control Engineering and Artificial Intelligence. pp 78â€“82
Acknowledgements
The authors would like to thank all the staf and students of school of North China University of Technology for contribution during this research process.
Funding
The work of this paper are supported by the KeyArea Research and Development Program of Guangzhou City (202206030009) and the Beijing Municipal Natural Science Foundation (No. 4202021).
Author information
Authors and Affiliations
Contributions
Shuo Zhang: Thesis Writing and Algorithm Experiment; Zhuofeng Zhao: topic determination and introduction; Chen Liu: Check the paper; Shenghui Qin: Write the third chapter.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Consent has been granted by all authors and there is no confict.
Competing interests
The authors declare no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, S., Zhao, Z., Liu, C. et al. Dataintensive workflow scheduling strategy based on deep reinforcement learning in multiclouds. J Cloud Comp 12, 125 (2023). https://doi.org/10.1186/s13677023005049
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677023005049
Keywords
 Dataintensive workflow
 DeepQnetwork
 Multiobjective optimization
 Intensive learning