 Research
 Open access
 Published:
A collaborative computation and dependencyaware task offloading method for vehicular edge computing: a reinforcement learning approach
Journal of Cloud Computing volumeÂ 11, ArticleÂ number:Â 68 (2022)
Abstract
Vehicular edge computing (VEC) is emerging as a new computing paradigm to improve the quality of vehicular services and enhance the capabilities of vehicles. It enables performing tasks with low latency by deploying computing and storage resources close to vehicles. However, the traditional task offloading schemes only focus on oneshot offloading, taking less into consideration task dependency. Furthermore, the continuous action space problem during task offloading should be considered. In this paper, an efficient dependencyaware task offloading scheme for VEC with vehicleedgecloud collaborative computation is proposed, where subtasks can be processed locally or can be offloaded to an edge server, or a cloud server for execution. Specifically, first, the directed acyclic graph (DAG) is utilized to model the dependency of subtasks. Second, a task offloading algorithm based on Deep Deterministic Policy Gradient (DDPG) was proposed to obtain the optimal offloading strategy in a vehicleedgecloud environment, which efficiently solves the continuous control problem and helps reach fast convergence. Finally, extensive simulation experiments have been conducted, and the experimental results show that the proposed scheme can improve performance by about 13.62% on average against three baselines.
Introduction
The Internet of Vehicles (IoV) is a new paradigm that combines traditional vehicle ad hoc network and vehicle remote information processing, which can effectively improve vehicular services and augment the capabilities of vehicles [1, 2]. In IoV, an intelligent vehicle is capable of running various applications [3, 4], such as collision warning [5], automatic driving [6], and auto navigation [7]. Unfortunately, these applications not only require significant computation resources and storage resources but also have stringent delay requirements [2, 8]. As a result, it is challenging to execute them on vehicles with low latency that have limited resources.
Vehicular edge computing (VEC) is proposed as a promising solution to solve the above problem, which integrates mobile edge computing (MEC) into IoV [9]. VEC can improve vehicle service quality [10,11,12] by deploying MEC serversâ€™ computation resources and storage resources close to vehicles. Specifically, computationintensive and delaysensitive tasks can be offloaded to MEC servers for execution via wireless networks [2, 13]. Compared to resourceconstrained vehicles, MEC servers with more computation resources can efficiently reduce the execution latency of these tasks.
Although VEC can reduce execution latency of tasks, MEC servers cannot ensure load balancing due to their limited computing and storage ability of [14, 15]. To better improve resource utilization, some tasks can be executed locally or offloaded to the cloud server for execution. Thus, this type of task offloading with vehicleedgecloud collaborative computing can obtain a low latency for various vehicular tasks. Such approaches have been extensively studied [9, 16]. For example, Dai et al. propose an offloading method for VEC, which offloads tasks based on vehicleedgecloud collaborative computing [9]. Xu et al. [16] presented a game theorybased service offloading approach to minimize task processing latency of users, where both predictions of traffic flow and the allocation of resources are considered. In these works, tasks were considered as a whole during task offloading. Their assumption is that offloaded tasks are atomic. However, a typical application task consists of a series of subtasks [17, 18], which are originally designed to enable multithread processing [18]. As shown in Fig. 1(a), we divide a task into five subtasks and use a directed acyclic graph (DAG) to describe the intersubtask dependency. Figure 1(b) illustrates the finegrained task offloading approach, where subtask 2, subtask 3, and subtask 4 are executed in parallel through vehicleedgecloud collaborative computing. Therefore, it is necessary to take task dependency into account during task offloading.
Deep reinforcement learning (DRL) is a branch of artificial intelligence (AI), which utilizes the perceptual ability of deep learning and the decisionmaking ability of reinforcement learning [19]. DRL can obtain the optimal offloading strategies by directly interacting with the dynamic vehicular network. For example, Dai et al. proposed an efficient task offloading approach based on the deep Qnetwork (DQN) to minimize the processing delay of tasks. He et al. [20] first introduce a novel Quality of Experience (QoE), and then proposed a task offloading algorithm based on DRL to improve QoE for IoV. In these works, these valuebased reinforcement learning methods mainly focused on discrete actions. However, the problem of task offloading in a vehicleedgecloud computing environment is a continuous action space (i.e., continuously allocating computation resources and bandwidth resources of edge servers). Therefore, it is necessary to take continuous action space into account during task offloading.
To tackle the above challenges, an efficient dependencyaware task offloading scheme based on a deep deterministic policy gradient (DDPG) named DATODDPG, is proposed. Compared with the existing task offloading methods, this approach considers both task dependencies and the continuous control problem during task offloading. Specifically, the main contributions of this article are summarized as follows:

Modeling the system model of task offloading in a vehicleedgecloud collaborative computing environment, where DAG is utilized to model the dependencies of subtasks in the task model.

Proposing an efficient dependencyaware task offloading scheme based on DDPG to achieve the optimal finegrained offloading strategy, which considers both task dependencies and continuous action spaces.

Conducting extensive experiments to evaluate the performance of our proposed scheme. The experimental results show that our scheme can greatly reduce the average processing time compared with baseline schemes.
The rest of this paper is organized as follows. Section 2 discusses the related works. Section 3 presents the system model and problem definition. Section 4 proposes the design of the dependencyaware task offloading scheme based on DDPG. Section 5 evaluates the performance of our proposed offloading scheme. Section 6 concludes this paper and outlines future work.
Related works
There are some studies work on task offloading for vehicular edge computing, which consists of mobilityaware task offloading [21], dependencyaware task offloading [22], learningbased task offloading [13], and traffic flow prediction task offloading [16]. Most work has studied the task offloading between vehicles and MEC servers. Yao et al. [19] considered both energy consumption and processing delay and proposed a twin delayed deep deterministic policy gradient algorithm to achieve the optimal offloading strategy. Zhang et al. [23] proposed an offloading method, where both the heterogeneous requirements of the computation tasks and the mobility of the vehicles are considered. Ren et al. [24] studied task offloading problems with multiple constraints and proposed a DDPGbased task offloading algorithm. Zhan et al. [25] studied the offloading problem without information sharing and proposed a computation offloading game formulation. However, these works only considered the resources of vehicles and MEC servers and did not fully utilize the rich resources of the cloud.
Recently, some works have begun to consider the problem of task offloading in the vehicleedgecloud computing environment. Xu et al. [16] presented a game theorybased service offloading approach to minimize task processing latency of users, where both predictions of traffic flow and the allocation of resources are considered. Dai et al. [9] proposed an efficient task offloading approach to minimize the processing delay of tasks, which jointly considered the edgecloud opportunities and the convergence of deep reinforcement learning. Zhang et al. [26] studied the problem of resource allocation for edge services under the vehicleedgecloud computing environment and proposed two algorithms to maximize social welfare and profit. However, these works did not consider the subtask dependencies within a task.
With respect to dependencyaware task offloading, Chen et al. [27] studied the problem of dependencyaware task offloading, where considered both the collaboration between MEC servers and the cloud servers and the collaboration among MEC servers. Fan et al. [28] proposed a heuristic algorithm with the aim of reducing the consumed energy of vehicles. Pan et al. [29] considered both energy consumption and task dependency in a vehicleedgecloud computing environment. A Qlearningbased framework was proposed to select optimal strategies. Chen et al. [30] studied multiple dependent tasks offloading problems under an endedgecloud collaborative computing environment. A DRLbased algorithm was proposed to reduce the average energytime cost. However, these works assume that each MEC server has enough resources for task offloading.
Besides, some studies have attempted to solve the task offloading problem via deep reinforcement learning. Qu et al. [31] proposed a deep metareinforcement learningbased offloading (DMRO) algorithm for reducing latency and energy consumption. Binh et al. [32] proposed a based on Deep QNetwork (DQN) algorithm to enhance the average quality of experience. Huang et al. [33] considered both task offloading and resource allocation and proposed a DeepQ Network based algorithm to minimize the overall offloading cost in terms of energy cost, computation cost, and delay cost. However, the research works simply assume that the action space of task offloading is discrete due to using a valuebased function.
Unlike these above works, we investigate dependencyaware task offloading for IoV in the vehicleedgecloud collaborative computing environment. Our work is different from these works in three aspects: 1) we take task dependency into account to further minimize execution delays when designing task offloading strategies. 2) we fully utilize the resources of vehicles, MEC servers, and the cloud server. 3) we efficiently solve the continuous control problems.
In addition, based on the above discussion in the field of task offloading for IoV, a sidebyside comparison is presented in Table 1 in terms of strategy, applied metrics, advantages, and weaknesses of each technical study.
System model and problem formulation
In this section, the system model of dependencyaware task offloading is designed, and then the problem of dependencyaware task offloading is formulated as an optimization problem. The key terms and the related descriptions in the system are listed in Table 2.
Network model
As illustrated in Fig. 2, the network model of dependencyaware task offloading for VEC is presented, which consists of three layers: user layer, edge layer, and cloud layer. The user layer is a collection of all vehicles, where each vehicle can process some subtasks of a task. The edge layer consists of RSUs and MEC servers, where each RSU is equipped with a MEC server. Each RSU can collect vehicle service requests within its coverage range and each MEC server can execute subtasks through offloading. The cloud layer is the cloud server, which has enough computing and storage resources.
We assume that all tasks generated by vehicles can be broken down into smaller subtasks. Each subtask can be processed locally or offloaded to a MEC server or the remote cloud server for execution.
The set of RSUs is denoted as \(R = \{ 1,2,...,r,...,R\}\), and the set of vehicles is denoted as \(K = \{ 1,2,...,k,...,K\}\). In addition, we assume that each vehicle has M computationintensive and delaysensitive tasks to be executed within a stringent completion time constraint. The variable \(Q_m^k\) is used to represent the nth task of vehicle k, represents one vehicle. For ease of reference, we show the key notations used in this article in Table 2.
Task model
In this section, the task model of dependencyaware task offloading is introduced. Since tasks are not atomic, their subtask may be interdependent, i.e., the output of some subtask is the input of another subtask [2]. To describe subtask dependencies with a task, each subtask can either be processed on the vehicle or offloaded to the edge server or cloud server for computation. Each task can be modeled as a directed acyclic graph (DAG), i.e., \(G = (\mathcal {I},\mathcal {E})\), where \(\mathcal {I}\) is the set of subtasks, and \(\mathcal {E}\) is the set of directed edges. Let \(I = \left \mathcal {I} \right\) denote the total number subtasks of task \(Q_m^k\). In the task graph, node \(Q_{m,i}^k\) means the ith subtask belonging to task \(Q_m^k\), and a directed edge \((Q_{m,i}^k,Q_{m,j}^k)\) denotes the subtask dependency that subtask \(Q_{m,j}^k\) cannot be performed until subtask \(Q_{m,i}^k\) has been completed, \(i,j \in I\).
To accurately illustrate the task dependencies, we show an example in Fig. 3. The figure shows that task \(Q_m^k\) is divided into 9 subtasks (i.e., \(Q_{m,1}^k\), \(Q_{m,2}^k\), \(Q_{m,3}^k\), \(Q_{m,4}^k\), \(Q_{m,5}^k\), \(Q_{m,6}^k\), \(Q_{m,7}^k\), \(Q_{m,8}^k\), \(Q_{m,9}^k\)). Specifically, subtask \(Q_{m,1}^k\) is entry task of task \(Q_m^k\). Subtask \(Q_{m,3}^k\) is predecessor task of \(Q_{m,4}^k\) and \(Q_{m,5}^k\). Subtask \(Q_{m,4}^k\) and \(Q_{m,5}^k\) are successor tasks of \(Q_{m,3}^k\). So, subtask \(Q_{m,4}^k\) and \(Q_{m,5}^k\) are start executed until subtask \(Q_{m,3}^k\) has been completed. Similarly, subtask \(Q_{m,9}^k\) is exit task of task \(Q_m^k\). Therefore, \(Q_{m,9}^k\) is start executed until all predecessor subtasks (i.e., \(Q_{m,1}^k\), \(Q_{m,2}^k\), \(Q_{m,3}^k\), \(Q_{m,4}^k\), \(Q_{m,5}^k\), \(Q_{m,6}^k\), \(Q_{m,7}^k\), \(Q_{m,8}^k\), \(Q_{m,9}^k\)) has been completed. In addition, subtask \(Q_{m,2}^k\) and \(Q_{m,3}^k\) or \(Q_{m,7}^k\) and \(Q_{m,8}^k\) can execute in parallel, Similarly, subtask \(Q_{m,4}^k\) , \(Q_{m,5}^k\), and \(Q_{m,6}^k\) can also be executed in parallel.
Each subtask \(Q_{m,i}^k\) can be described in two terms as \(Q_{m,i}^k = \left\langle {d_{m,i}^k,c_{m,i}^k} \right\rangle\), \(d_{m,i}^k\) denotes the input data size of the ith subtask and \(c_{m,i}^k\) denotes the amount of computation resource required to complete subtask \(Q_{m,i}^k\).
Communication model
In this section, the communication model of the task offloading is introduced, which consists of the communication model of vehicle to RSUs and the communication model of RSUs to the cloud server.
1) Vehicles to RSUs: we consider that the wireless communication (i.e., 4G, 5G, and WiFi) between vehicles and RSUs is based on the orthogonal frequencydivision multiple access [2]. Specifically, we let \({\lambda _{k,e}}\) denote the data transmission rate between vehicle k and RSU r, which can be calculated as [2, 34]
where \(n_k^e\) denotes the number of subchannels allocated to the vehicle k, w denotes the bandwidth of each subchannel between vehicle k and RSU r, \(p_k\) denotes the transmission power of vehicle k, \(\delta _{k,e}\) denotes the channel gain between vehicle k and RSU r, \(\sigma ^{  2}\) denotes the surrounding noise power.
2) RSUs to cloud server: we consider that the wireline communication (i.e., highspeed optical fiber lines) between RSUs and the cloud server. In the case MEC servers cannot provide computation services for vehicles, vehicles offload the subtasks to the remote cloud server via RSUs. Especially, it should be noted that the transmission latency of the MEC server transmitting data to the cloud server is equal to the transmission latency of the cloud server returns the result to the MEC server and this transmission latency is independent of the size of the data, due to the long geographical distances between cloud servers and MEC servers [2, 34]. We let \({t_{r,c}}\) denote the roundtrip time of transmission data between RSU r and the cloud server, which can be calculated as [2, 34]
where \(t_{off}^{cloud}\) denotes the transmission latency of RSU r transmitting data to the cloud server.
Computation model
Subtasks with a task can either be performed on the vehicle or be offloaded to the MEC server or cloud server for execution. Thus, in this section, the processing time of subtasks is discussed in the vehicle, the MEC server, and the cloud server, respectively.
1) Local computing: when the subtask is assigned to be executed on the vehicle, we assume that each vehicle processes one subtask at a time. In this case, if there is a subtask computed in a vehicle, other subtasks executed locally need to wait in a task queue based on the firstinfirstout principle until the computation resources are available [9, 35]. Thus, the total local completion delays of subtask \(Q_{m,i}^k\) consist of the local execution delay and the local waiting delay, which is given by:
where \(\frac{{c_{m,i}^k}}{{{f_k}}}\) denotes the local execution delay, \(f_k\) denotes the computation capability of vehicle k, \(c_{m,i}^k\) denotes the amount of computation resource required to complete subtask \(Q_{m,i}^k\), and \(t_{k,m,i}^{local,wait}\) is the local waiting delay of the subtask \(Q_{m,i}^k\), denoted by the difference between the execution starting time and requested time as \(t_{k,m,i}^{local,wait} = t_{k,m,i}^{local,start}  t_{k,m,i}^{local,request}\).
2) Edge computing: when the subtask is offloaded to MEC servers for processing, we consider that the whole process can be broken down into three parts: the transmission delay, the execution delay, and the waiting delay of the subtask on the MEC server.
Firstly, the raw data of the subtask \(Q_{m,i}^k\) is transmitted from vehicle k to MEC server e via wireless communication. According to the communication mode (1), the transmission delay of the subtask \(Q_{m,i}^k\) is defined as
Secondly, similar of local execution, the processing delay of the subtask \(Q_{m,i}^k\) on the MEC server e is defined as
where \(f_e^{mec}\) represents the computation capability of the MEC server e.
Thirdly, the edge waiting time is similar to the local waiting time.
Where \(t_{k,m,i}^{edge,start}\) denotes the start execution time of the subtask, and \(t_{k,m,i}^{edge,request}\) denotes the requested time of subtask.
According to equations (4), (Eq5), and (Eq6), the total completion delays for offloading subtask to the MEC server e can be defined as
3) Cloud computing: when the subtask is offloaded to the cloud server for processing, the raw data of subtask \(Q_{m,i}^k\) is first transmitted from vehicle k to RSU r, and then is transmitted from RSU r to the cloud server. In addition, considering the enormous computation capability of the cloud server, the execution delay is negligible compared with the transmission delay [2, 16]. Therefore, the total completion delays of offloading subtask \(Q_{m,i}^k\) to the cloud server can be broken down into two parts: the transmission delay between vehicle k and RSU r, and the transmission delay between RSU r and the cloud server, which is defined as
where \(\frac{{d_{m,i}^k}}{{{\lambda _{k,e}}}}\) denotes the transmission delay between vehicle k and RSU r, and according to (2) \(t_{e,c}\) denotes the cloud transmission delay.
Problem formulation
In this part, we formalize the problem of dependencyaware task offloading for VEC as an optimization problem. This optimization problem aims to minimize the average processing latency under the constraints of computing and communication resources of MEC servers. Specifically, the optimization problem is defined as
where \(r_{m,i}^k = \{ \beta _{k,m,i}^E,x_{k,m,i}^E\}\) denotes the allocated computation and communication resources, \(t_{m,i}^k{\mathrm { = }}t_{k,m,i}^{local} \alpha _{k,m,i}^{local}{\mathrm { + }}t_{k,m,i}^{mec,offload}\alpha _{k,m,i}^{mec}{\mathrm { + }}t_{k,m,i}^{cloud,offload}\alpha _{k,m,i}^{cloud}\) , B denotes the maximum communication capability of MEC servers, and C denotes the maximum computation capability of MEC servers.
Task offloading algorithm based on DDPG
In this section, an efficient dependencyaware task offloading algorithm based on DDPG, named DATODDPG, is proposed. Compared with the valuebased reinforcement learning (RL) approach (e.g., DQN) [24, 36], DDPG combines the characteristics of DQN and the actorcritic (AC) algorithm to learning the Q value and the deterministic policy by the experience relay and the frozen network [37], thereby efficiently solving continuous control problems and helping reach the fast convergence. Figure 4 illustrates the framework of DATODDPG. First, the algorithm settings of DATODDPG are defined. Second, we present the details of DATODDPG. Third, the action selection based on the \(\mathcal {E}\)greedy policy is described. Finally, the DDPG network update is detailed presented.
Algorithm setting
In this section, DDPG is introduced to address the proposed optimization problem in (9). i.e., obtaining the optimal offloading strategy (i.e., the subtask executed on local, or offloading to edge server, or offloading to cloud server) through exploring the dynamic environment at the beginning of each subtask offloading round.
Similar to [38], we assume that there is one centralized MEC server as the agent in DATODDPG. The agent identifies the optimal finegrained offloading strategy by interacting with the environment through a sequence of observations, actions, and rewards [39]. Specifically, when the agent receives a subtask request from a vehicle, it selects an action based on the current environment state. Then the vehicles choose where to execute the task according to the selected action. After the action is executed, the agent receives a reward that indicates the benefits of the selected action. Finally, the environment evolves into the next state.
There are four key elements in the DDPG, namely environment, state, action, and reward, which are specified as follows:
Environment. The environment env reflects the internet of vehicles environment, including the set of vehicles, the set of tasks, the transmission power of the vehicle, the channel gain between the vehicle and MEC server, and the computation and communication resources of MEC servers. The environment is defined as
State. The state s reflects the observation the available resources of MEC servers, which is defined as
where the subscript t denotes the tth time step, \(B_a\) denotes the available communication resources of MEC servers, and \(C_a\) denotes the available computation resources of MEC servers.
Action. Based on the observed states, the agent decides the allocation of computing and bandwidth resources of the MEC server for executing the subtask. The action a is defined as
where \(a_{k,m,i}^t\) denotes the action of subtask i of task \(Q_{m,i}^k\) at the tth time step, \(b_{k,m,i}\) denotes the allocated communication resources for subtask i of task \(Q_{m,i}^k\), and \(c_{k,m,i}\) denotes the allocated computation resources for subtask i of task \(Q_{m,i}^k\).
Reward. According to the state and action, the agent calculates the offloading strategy benefits, which can be defined as
where \(r_{k,m,i}^t\) denotes the benefits of the action \(a_{k,m,i}^t\) of subtask i of task \(Q_{m,i}^k\) at the tth time step.
Task offloading algorithm
The illustration of DATODDPG is shown in Fig. 4, which combines the characteristics of DQN and the actorcritic (AC) [37]. Therefore, it consists of three components, namely evaluation network, target network, and replay memory.
The evaluation network consists of two deep neural networks, namely an evaluation actor network CriticE and an evaluation critic network ActionE. The evaluation actor network is utilized to explore the offloading strategy. The evaluation critic network estimates the offloading strategy and provides the critic value which helps the evaluation actor to learn the gradient of the policy. In addition, the input of the evaluation network is the current state \(s_t\), the output is the action \(a_t\), the training state and the training action from replay memory.
The target network can be understood as an older version of the evaluation network due to their having the same network structure but different parameters, which is utilized to produce the target value for training CriticE. It consists of a target actor network ActorT and a target critic network CriticT. The input of the target network is the next state \(s_{t+1}\) from replay memory and the output is a critic value for computing loss of CriticE.
The replay memory is used to store experience tuples, consisting of the current state, selected action, reward, and next state. The stored experience tuples can be randomly sampled for training the evaluation network and the target network. The randomly sampled experience tuples are intended to reduce the effect of data correlation.
The DATODDPG algorithm based on DDPG is described in Algorithm 1, which mainly includes three main parts: selection (Line 7), reward evaluation (Line 9), and network update (Line 14).
Reward evaluation algorithm
Evaluating the reward of each offloading decision not only enables us to obtain an optimal offloading strategy but also accelerates the convergence of the DATODDPG algorithm. The goal of the DATODDPG algorithm is to maximize the reward of performing actions. Therefore, the reward is negatively related to the execution time. The algorithm for evaluating reward is shown in Algorithm 2. Specifically, when the subtask is executed locally, the reward is equal to 0 (Lines 45). When the subtask is offloading to the MEC server for execution, the reward is equal to the subtaskâ€™s local processing delay minus the subtaskâ€™s processing delay on the edge server (Lines 67). Similarly, when a subtask is offloaded to a cloud server for execution, the reward is equal to the subtaskâ€™s local processing delay minus the subtaskâ€™s processing delay on the cloud server (Lines 89).
Network upload algorithm
The network update is outlined in Algorithm 3. Specifically, in each training step, a minibatch of experience tuples \(D_t\) are randomly sampled from replay memory D (Line 12). Then, the target critic network CriticT calculates the target value \(y_t\) and transmits \(y_t\) to evaluation critic network CriticE (Line 3). After receiving \(y_t\) CriticE updates \({\theta ^Q }\) by minimized the loss function \({L(\theta ^Q) }\) (Line 4). On the other hand, Utilizing the sampled policy gradient \(\nabla {\theta _\mu }J\) to update the weights \({\theta ^\mu }\) of evaluation actor network (Line 5). Finally, the parameters \({\theta ^{Q'} }\) and \({\theta ^{\mu '} }\) of target actor network and target critic network are updated after each C step, respectively (Line 68).
Experimental performance
In this section, extensive experiments are carried out to evaluate the performance of our proposed scheme. The experimental setting details are explained first, and then the convergence performance of the DATODDPG scheme is analyzed. Finally, DATODDPG is compared with existing task offloading schemes to prove the effectiveness of the DATODDPG scheme.
Experimental setting
A VEC system is simulated, which consists of 7 vehicles, 40 tasks, and 4 RSUs. Each RSU is equipped with one MEC server and each MEC server is equipped with several CPUs. The size of input data of tasks is randomly generated from the set {25, 30, 40, 45, 60} MB. The computation resource requirements of tasks are randomly assigned from the set {0.5, 0.6, 0.7, 0.8, 1.2} Gigacycle/s. Each task is randomly divided into 4 to 8 subtasks, and the size of input data and computation resource requirements of the task are randomly assigned to subtasks. In addition, the other parameters in the experiments are set in Table 3.
To demonstrates the effectiveness of DATODDPG, three baselines are selected to compare the DATODDPG as follows.

Dependencyaware random offloading (DARO). The DARO is a traditional offloading approach without utilizing optimization algorithms, where the edge server randomly assigns subchannels and computational resources to vehicles for the corresponding task offloading operations.

Dependencyaware task offloading scheme based on DQN (DATODQN). The DATODQN is a finegrained task offloading approach, which considers the decomposability and dependencies of the task. Unlike our method DATODDPG, DATODQN uses a valuebased reinforcement learning approach to allocate subchannels and computational resources. It is an implementation of MORLODT [42].

Entire task offloading scheme based on DDPG (ETODDPG). The ETODDPG is a coarsegrained task offloading approach, which without the consideration of the decomposability and dependency of tasks. It is a realization of [43].
Evaluation of train performance
As a DDPGbased algorithm, the train performance of the model should be guaranteed. To prove the train performance of DATODDPG, we first compare the convergence under different learning rates, and we second compare its convergence with other baselines.
1) Convergence with Different Learning Rates: through the simulations with different learning rates, the most appropriate value for learning rate is 0.01, which has an average reward of 2024, as shown in Fig. 5. The worst value for this parameter is 0.08, and its average reward is 2003. Thus, the learning rate is set to 0.01 in the following system simulations and evaluations.
2) Convergence with Different Baselines: Figure 6 shows the train performance of different baselines (i.e., ETODDPG, DATODQN), which is the average result of multiple experiments. It is seen that the DATODDPG converges faster and has a higher reward value than the ETODDPG and the DATODQN. The reason is that DATODDPG both considers the intersubtask dependency and the dynamic environment, and utilizes DDPG to solve the continuous control problems. Thus, finegrained offloading opportunities can be well obtained. Specifically, the ETODDPG is a coarsegrained task offloading approach that does not consider task decomposability and dependencies thus the finegrained offloading opportunities for subtasks are wasted and cannot obtain optimization strategy. The DATODQN demonstrates slow convergence and unstable performance caused by that DQN shows the inefficiency on the network with a high dimension in the action space.
Performance evaluation and analysis
To verify the adaptability and effectiveness of DATODDPG, three sets of simulation experiments with diversity in environments are conducted, and the performance of DATODDPG is evaluated.
The control values for the comparative analysis variables are listed in Table 3. In each set of experiments, the value of one variable fluctuated around the control value and the other variables remained constant (Table 4).
1) Analysis on the Variety of Task Number: When the number of Tasks in the offloading system are different, the Average processing delay of different baselines are shown in Fig. 7. With other variables unchanged, the number of tasks ranges from 20 to 70 in this set of experiments. It is seen that the average processing delay increase with the rise in the number of tasks. As the number of tasks grows from 20 to 70, DATODDPG perpetually outperforms DARO, DATODQN and ETODDPG. This is because DATODDPG gains the most appropriate finegrained offloading opportunities for subtasks. More specifically, when the task number increases from 20 to 70, the DATODDPG outperforms the DARO, DATODQN, and ETODDPG by the improvements of 21.3% to 11.4%, 8.8% to 7.6%, and 13.2% to 17.8%. In addition, when the task number increases to 70, the performance of ETODDPG drops sharply. This is because ETODDPG does not take into account the intertask dependency, thus MEC servers suffer from the computation resource contention caused by a large number of tasks.
2) Analysis on the Variety of MEC Server Number: In Fig. 8 illustrates the impact on the average processing delay by the number of MEC servers. Experiments are conducted with the number of MEC servers ranging from 2 to 6, while the other variables remain unchanged. The figure shows that the average processing delay goes down as the number of MEC servers increases. This is because the increasing MEC servers can introduce more computing resources and communication bandwidth into the system. Especially, when the number of MEC servers is 2, 3, 4, 5, and 6, the improvement of DATODDPG compared to the DARO is 12.4%, 16.1%, 20.5%, 15.2%, and 21%, the improvement of DATODDPG compared to the DATODQN is 7.8%, 11.6%, 13.6%, 12.4%, and 12.1%, the improvement of DATODDPG compared to the ETODDPG is 10.1%, 13.4%, 17.4%, 16.1%, and 18.8%, respectively. The performance of DATODDPG is higher than other baselines due to DATODDPG both considering the dynamic environment and intertask dependency, and utilizing DDPG to solve the problem of continued action space.
3) Analysis on the Variety of Average size of Input Data: In Fig. 9, the average processing delay with diversity in the average size of input data is analyzed. Experiments are conducted with the average size of raw data ranging from 25 to 60 MB, while the other variables remain unchanged. It can be seen that the average processing delay increases when the average size of input data, which is because the raw data of offloaded tasks should be transmitted to a MEC server or the cloud server, which results in increase of the time cost of transmission. Peculiarly, As the average size of input data increases from 25 to 60 MB, the DATODDPG outperforms the DARO, DATODQN, and ETODDPG by the improvements of 20.2%, 11.2%, and 20.9% on average. In addition, When the average size of input data increases to 60MB, the performance of ETODDPG drops sharply. This is reason that the ETODDPG is a coarsegrained task offloading approach that does not consider the parallelism of subtasks on the vehicle, MEC servers and the cloud server, uploading all data to the MEC server and the cloud server which increases the transmission delay, Overall, we can see that the DATODDPG always gains the optimal performance of the average processing delay under the different average size of input data.
Conclusion
In this paper, an efficient dependencyaware task offloading scheme is proposed for reducing the average processing delay of tasks with vehicleedgecloud collaborative computing. In this scheme, the directed acyclic graph technique is utilized to model the intersubtask dependency. Then, a dependencyaware task offloading algorithm based on DDPG is designed to select the optimal offloading strategy, in which the continuous control problems and allocation of edge server resources were considered. Simulation results show that our proposed dependencyaware offloading scheme can effectively reduce the average processing delay of tasks.
For future work, we will consider both the mobility of vehicles and data migration between edge servers during task offloading.
Availability of data and materials
Not applicable.
References
Ji H, Alfarraj O, Tolba A (2020) Artificial intelligenceempowered edge of vehicles: architecture, enabling technologies, and applications. IEEE Access 8:61020â€“61034
Liu Y, Wang S, Zhao Q, Du S, Zhou A, Ma X, Yang F (2020) Dependencyaware task scheduling in vehicular edge computing. IEEE Internet Things J 7(6):4961â€“4971
Xu X, Shen B, Ding S, Srivastava G, Bilal M, Khosravi MR, Menon VG, Jan MA, Wang M (2020) Service offloading with deep qnetwork for digital twinningempowered internet of vehicles in edge computing. IEEE Trans Ind Inform 18(2):1414â€“1423
Chen Y, Zhang N, Zhang Y, Chen X, Wu W, Shen XS (2019) Toffee: Task offloading and frequency scaling for energy efficiency of mobile devices in mobile edge computing. IEEE Trans Cloud Comput 9(4):1634â€“1644
Liu Y, Li Y, Niu Y, Jin D (2019) Joint optimization of path planning and resource allocation in mobile edge computing. IEEE Trans Mob Comput 19(9):2129â€“2144
Zhang J, Guo H, Liu J, Zhang Y (2019) Task offloading in vehicular edge computing networks: A loadbalancing solution. IEEE Trans Veh Technol 69(2):2092â€“2104
Chen Y, Zhao F, Chen X, Wu Y (2021) Efficient multivehicle task offloading for mobile edge computing in 6g networks. IEEE Trans Veh Technol
Nguyen D, Ding M, Pathirana P, Seneviratne A, Li J, Poor V (2021) Cooperative task offloading and block mining in blockchainbased edge computing with multiagent deep reinforcement learning. IEEE Trans Mob Comput
Dai F, Liu G, Mo Q, Xu W, Huang B (2022) Task offloading for vehicular edge computing with edgecloud cooperation. World Wide Web:1â€“19
Shakarami A, GhobaeiArani M, Shahidinejad A (2020a) A survey on the computation offloading approaches in mobile edge computing: A machine learningbased perspective. Comput Netw 182:107496
Shakarami A, GhobaeiArani M, Masdari M, Hosseinzadeh M (2020b) A survey on the computation offloading approaches in mobile edge/cloud computing environment: a stochasticbased perspective. J Grid Comput 18(4):639â€“671
Shakarami A, Shahidinejad A, GhobaeiArani M (2020c) A review on the computation offloading approaches in mobile edge computing: A g ametheoretic perspective. Softw Pract Experience 50(9):1719â€“1759
Shakarami A, Shahidinejad A, GhobaeiArani M (2021) An autonomous computation offloading strategy in mobile edge computing: A deep learningbased hybrid approach. J Netw Comput Appl 178:102974
Liu Y, Chen CS, Sung CW, Singh C (2017) A game theoretic distributed algorithm for feicic optimization in ltea hetnets. IEEE/ACM Trans Netw 25(6):3500â€“3513
Guo H, Liu J (2018) Collaborative computation offloading for multiaccess edge computing over fiberwireless networks. IEEE Trans Veh Technol 67(5):4514â€“4526
Xu X, Jiang Q, Zhang P, Cao X, Khosravi MR, Alex LT, Qi L, Dou W (2022) Game theory for distributed iov task offloading with fuzzy neural network in edge computing. IEEE Trans Fuzzy Syst
Aceto L, Morichetta A, Tiezzi F (2015) Decision support for mobile cloud computing applications via model checking. In: 2015 3rd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering. IEEE, pp 199â€“204
Shu C, Zhao Z, Han Y, Min G, Duan H (2019) Multiuser offloading for edge computing networks: A dependencyaware and latencyoptimal approach. IEEE Internet Things J 7(3):1678â€“1689
Yao L, Xu X, Bilal M, Wang H (2022) Dynamic edge computation offloading for internet of vehicles with deep reinforcement learning. IEEE Trans Intell Transp Syst
He X, Lu H, Du M, Mao Y, Wang K (2020) Qoebased task offloading with deep reinforcement learning in edgeenabled internet of vehicles. IEEE Trans Intell Transp Syst 22(4):2252â€“2261
Yang C, Liu Y, Chen X, Zhong W, Xie S (2019) Efficient mobilityaware task offloading for vehicular edge computing networks. IEEE Access 7:26652â€“26664
Wang J, Hu J, Min G, Zhan W, Zomaya A, Georgalas N (2021) Dependent task offloading for edge computing based on deep reinforcement learning. IEEE Trans Comput
Zhang K, Mao Y, Leng S, He Y, Zhang Y (2017) Mobileedge computing for vehicular networks: A promising network paradigm with predictive offloading. IEEE Veh Technol Mag 12(2):36â€“44
Ren Y, Yu X, Chen X, Guo S, XueSong Q (2020) Vehicular network edge intelligent management: A deep deterministic policy gradient approach for service offloading decision. In: 2020 International Wireless Communications and Mobile Computing (IWCMC). IEEE, pp 905â€“910
Zhan Y, Guo S, Li P, Zhang J (2020) A deep reinforcement learning based offloading game in edge computing. IEEE Trans Comput 69(6):883â€“893
Zhang Y, Lan X, Ren J, Cai L (2020) Efficient computing resource sharing for mobile edgecloud computing networks. IEEE/ACM Trans Networking 28(3):1227â€“1240
Chen L, Wu J, Zhang J, Dai HN, Long X, Yao M (2020) Dependencyaware computation offloading for mobile edge computing with edgecloud cooperation. IEEE Trans Cloud Comput
Fan Y, Zhai L, Wang H (2019) Costefficient dependent task offloading for multiusers. IEEE Access 7:115843â€“115856
Pan S, Zhang Z, Zhang Z, Zeng D (2019) Dependencyaware computation offloading in mobile edge computing: A reinforcement learning approach. IEEE Access 7:134742â€“134753
Chen J, Yang Y, Wang C, Zhang H, Qiu C, Wang X (2021) Multitask offloading strategy optimization based on directed acyclic graphs for edge computing. IEEE Internet Things J
Qu G, Wu H, Li R, Jiao P (2021) Dmro: A deep meta reinforcement learningbased task offloading framework for edgecloud computing. IEEE Trans Netw Serv Manag 18(3):3448â€“3459
Binh TH, Vo HK, Nguyen BM, Binh HTT, Yu S et al (2022) Valuebased reinforcement learning approaches for task offloading in delay constrained vehicular edge computing. Eng Appl Artif Intell 113:104898
Huang L, Feng X, Zhang C, Qian L, Wu Y (2019) Deep reinforcement learningbased joint task offloading and bandwidth allocation for multiuser mobile edge computing. Digit Commun Netw 5(1):10â€“17
Xu X, Fang Z, Qi L, Dou W, He Q, Duan Y (2021) A deep reinforcement learningbased distributed service off loading method for edge computing empowered internet of vehicles. Chin J Comput 44(12):2382â€“2405
Chen X, Liu Z, Chen Y, Li Z (2019) Mobile edge computing based task offloading and resource allocation in 5g ultradense networks. IEEE Access 7:184172â€“184182
Wang Y, Fang W, Ding Y, Xiong N (2021) Computation offloading optimization for uavassisted mobile edge computing: a deep deterministic policy gradient approach. Wirel Netw 27(4):2991â€“3006
Li M, Gao J, Zhao L, Shen X (2020) Deep reinforcement learning for collaborative edge computing in vehicular networks. IEEE Trans Cogn Commun Netw 6(4):1122â€“1135
You C, Huang K, Chae H, Kim BH (2016) Energyefficient resource allocation for mobileedge computation offloading. IEEE Trans Wirel Commun 16(3):1397â€“1411
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Humanlevel control through deep reinforcement learning. Nature 518(7540):529â€“533
Chen X, Zhang H, Wu C, Mao S, Ji Y, Bennis M (2018) Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet Things J 6(3):4005â€“4018
Sun Y, Zhou S, Xu J (2017) Emm: Energyaware mobility management for mobile edge computing in ultra dense networks. IEEE J Sel Areas Commun 35(11):2637â€“2646
Song F, Xing H, Wang X, Luo S, Dai P, Li K (2022) Offloading dependent tasks in multiaccess edge computing: A multiobjective reinforcement learning approach. Futur Gener Comput Syst 128:333â€“348
Xu YH, Yang CC, Hua M, Zhou W (2020) Deep deterministic policy gradient (ddpg)based resource allocation scheme for noma vehicular communications. IEEE Access 8:18797â€“18807
Acknowledgements
The authors would like to thank the staff and postgraduate students at the School of Big Data and Intelligent Engineering of Southwest Forestry University for their assistance and valuable advice.
Funding
This work has been supported by the Project of National Natural Science Foundation of China under Grant No. 62262063, the Project of Key Science Foundation of Yunnan Province under Grant No. 202101AS070007, Dou Wanchun Expert Workstation of Yunnan Province No.202205AF150013, Science and Technology Youth lift talents of Yunnan Province, and the Project of Scientific Research Fund Project of Yunnan Education Department under Grant No. 2022Y561.
Author information
Authors and Affiliations
Contributions
Guozhi Liu: WritingOriginal draft preparation, Conceptualization, Methodology, Software, Funding acquisition, Visualization, and Data Curation. Fei Dai: Conceptualization, Methodology, WritingReviewing and Editing, Funding acquisition, and Validation. Bi Huang: WritingReviewing and Editing, Resources, and Formal analysis. Zhenping Qiang: Supervision, and Validation. Shuai Wang: resource allocation, and supervision. Lecheng Li: WritingReviewing and Editing, and Investigation.Â The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The work is a novel work and has not been published elsewhere nor is it currently under review for publication elsewhere.
Consent for publication
Informed consent was obtained from all individual participants included in the study.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, G., Dai, F., Huang, B. et al. A collaborative computation and dependencyaware task offloading method for vehicular edge computing: a reinforcement learning approach. J Cloud Comp 11, 68 (2022). https://doi.org/10.1186/s13677022003403
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677022003403