 Research
 Open access
 Published:
Optimizing task offloading and resource allocation in edgecloud networks: a DRL approach
Journal of Cloud Computing volume 12, Article number: 112 (2023)
Abstract
Edgecloud computing is an emerging approach in which tasks are offloaded from mobile devices to edge or cloud servers. However, Task offloading may result in increased energy consumption and delays, and the decision to offload the task is dependent on various factors such as timevarying radio channels, available computation resources, and the location of devices. As edgecloud computing is a dynamic and resourceconstrained environment, making optimal offloading decisions is a challenging task. This paper aims to optimize offloading and resource allocation to minimize delay and meet computation and communication needs in edgecloud computing. The problem of optimizing task offloading in the edgecloud computing environment is a multiobjective problem, for which we employ deep reinforcement learning to find the optimal solution. To accomplish this, we formulate the problem as a Markov decision process and use a Double Deep QNetwork (DDQN) algorithm. Our DDQNedgecloud (DDQNEC) scheme dynamically makes offloading decisions by analyzing resource utilization, task constraints, and the current status of the edgecloud network. Simulation results demonstrate that DDQNEC outperforms heuristic approaches in terms of resource utilization, task offloading, and task rejection.
Introduction
The advent of IoT and 5G has enabled the development of new applications in areas such as surveillance, augmented/virtual reality, and facial recognition, which heavily depend on both computational resources and data storage for optimal performance. IoT and mobile devices have limited resources so it is difficult for them to support such intelligent, delaysensitive applications [1]. Hence, edge computing can help these devices by offloading computationintensive tasks to the cloud. The cloud can cause delays in communication and data transfer as a result of network congestion and high usage, this can limit their use in realtime applications such as autonomous driving, advanced navigation, augmented reality (AR), and virtual reality (VR). Overall, edgecloud computing and 5G can work together to enable new types of applications and services that require lowlatency, realtime processing. The use of edge computing can reduce latency, enhance performance, and reduce costs associated with cloud computing. The edge servers can process computationintensive tasks instead of sending them to the cloud [2].
Edgecloud computing is a distributed computing model that utilizes both cloud computing resources and edge devices, such as servers and IoT devices, to process and transmit data. This model aims to improve processing speed decisionmaking and reduce latency and bandwidth usage by bringing the cloud's processing power closer to the edge where data is generated and collected. Edge and cloud computing can coexist and work together to provide the necessary resources for task execution. The integration of AI with edgecloud computing allows for the deployment of machine learning algorithms on edge devices, providing a high level of intelligence to the edgecloud network and enabling it to understand its environment [3, 4]. Before offloading tasks to the cloud or edge, it is essential to carefully consider the requirements and available resources. However, edgecloud computing is a dynamic and resourceconstrained environment. Hence making an optimal decision for task offloading based on the available resource is a critical issue.
Task offloading is the process of transferring a task or workload from a local device to a remote device, such as a server or cloud resource, to improve the local device's performance and efficiency. Offloading tasks can result in increased delay and energy consumption while edge servers may have limited computational capacity, which can lead to increased computing latency. It is important to consider these tradeoffs before making a decision. 5G networks with high densities may also experience higher transmission delays. Cooperative task offloading is a technique used in edgecloud networks to improve the performance of distributed systems. In the distributed approach, tasks are split among devices in the network, such as edge devices and cloud servers, to optimize resource usage and reduce the workload on individual devices. In an edgecloud computing environment, it can be challenging to determine the optimal location for task offloading, as there are many factors to consider, including the computational capacity of edge servers, the transmission delays of networks, and the diverse requirements of end devices. Numerous research has been conducted on the topic of computation offloading in edgecloud networks [5,6,7,8,9]. However, due to the diverse requirements of end devices and the limited information available about wireless channels, bandwidth, and computing resources in edgecloud networks, it is challenging to design an optimal offloading strategy.
Deep Reinforcement Learning is a subset of machine learning that combines reinforcement learning and deep learning techniques to handle highdimensional stateaction spaces and accelerate training for complex decisionmaking tasks. Many successful applications of DRL have been demonstrated in a wide range of fields, including gaming [7], robotics [8], and networking. The application of DRL to edgecloudassisted networks can optimize system performance by perceiving users' mobility [10, 11]. Task offloading problems have been addressed with DRL in [9, 12, 13], and [14]. It can be able to find the best solutions to the optimization problems of timevarying and dynamic network environments. In the context of task offloading at the edge, DuelingDQN could potentially be used to learn the optimal decisionmaking policy for offloading tasks to different locations in a dynamic and resourceconstrained edgecloud environment. The DDQN architecture separates the estimator into two streams, value, and advantage, and then aggregates them to make the final estimation of the Qvalue. This allows for better generalization of the Qvalue estimation by decoupling the estimation of the value of a state from the estimation of the advantage of taking a specific action in that state. This could involve training the DDQN to learn the optimal tradeoffs between different factors, such as the task requirements, available resources, latencies, and costs, to make informed decisions about the best location for task offloading.
In this research, we propose an advanced edgecloud computing scheme that leverages the power of Double Deep QNetwork (DDQN) to optimize the offloading of computations and the allocation of resources for the offloaded tasks. Our proposed scheme is an extension of our previous work that employed DQN [9], but it goes one step further by utilizing the DDQN algorithm to improve the decisionmaking process and achieve a more efficient and optimized solution for the edgecloud computing task offloading and resource allocation problem. The goal of the proposed scheme is to minimize energy consumption while satisfying the computation and communication requirements of the offloading tasks at edgecloud computing. DDQN enables efficient and effective offloading decisions in dynamic and resourceconstrained environments, such as edgecloud, by adapting to changes and learning from past experiences. In terms of training, DDQN separates the Qvalue estimator into two streams, value and advantage, and then aggregates which improve the generalization. We present the task offloading and resource allocation problem as an MDP and use the DDQN algorithm to identify the optimal policy. Our simulation results show that the proposed method outperforms heuristic schemes and DQNEC in terms of resource utilization, maximum task offloading, and task rejection. The main contributions of our work are:

To optimize resource allocation for computeintensive and delaysensitive tasks in the edge cloud computing environment, we developed a scheme based on the double DQN. Our scheme determines the best actions to take in the current system state, ultimately improving the overall performance of the edge cloud

Considering the dynamics and unpredictability of edge device environments, we modeled the task offloading problem as a Markov Decision Process (MDP) and applied DoubleDQN to maximize the longterm cumulative discounted reward. Our optimization objective was to maximize resource utilization, minimize task rejection, and minimize the roundtrip time to complete the task within the deadline

By using the proposed DoubleDQNEC scheme, we can reduce idle time and balance the workload of the edgecloud computing system. The simulation results show that the DDQNbased algorithm significantly outperforms the heuristic algorithm.
The rest of the paper is organized as follows: In Sect. 2 the workrelated are reviewed, especially relative to task offloading and resources allocation in the edgecloud computing environment; Sect. 3 provides the system model and formulation of the proposed DDQNEC scheme in detail; in Sect. 4 the simulation result and comparison are discussed that validate our approach; finally, the conclusion is made in Sect. 5.
Related work
DRL learns optimal policies and makes quick decisions by interacting with a timevarying environment, which is suitable for dynamic MEC systems. There are mainly two research streams, including valuebased and policybased methods.
Valuebased DRL methods
Huang et al. introduced DROO, an online offloading algorithm for wirelesspowered mobile edge computing (MEC) networks, in their work published in [12]. To solve the optimization problem, the algorithm splits it into two subproblems, namely resource allocation and offloading decision. Through evaluation and simulation, the authors observed that DROO surpassed existing methods in terms of energy efficiency, task completion time, and network congestion. This makes DROO a promising solution for enhancing the performance of wirelesspowered MEC networks. Li et al. [13] applied the deep Qnetwork (DQN) to jointly handle computation offloading and resource allocation in multiuser MEC, minimizing the sum cost of delay and energy consumption. In [14], an improved resource allocation policy was proposed for IoT edge computing based on the deep Qnetwork (DQN) approach. This policy aims to improve the efficiency of resource utilization and reduce task completion delays in the system. The DQN algorithm allows the system to learn from the experience and make better decisions for resource allocation in realtime.
Chen et al. [15] proposed the DeepSARL algorithm to address the computation offloading problem in a mobile edge computing system by formulating it as a Markov decision process. The algorithm uses deep reinforcement learning with a selfattention mechanism to learn an optimal offloading policy that maximizes the longterm utility performance. The DeepSARL algorithm was evaluated through simulations, and the results showed that it outperformed existing offloading methods in terms of average reward and energy efficiency. The proposed algorithm can help improve the performance of mobile edge computing systems, especially for applications that require realtime processing and low latency. Lu et al. [16] proposed a DQN algorithm for largescale heterogeneous MEC, achieving good performance without prior knowledge of environment statistics. However, valuebased methods are limited in handling continuous action space, such as in the dynamic JCORA problem. In [10], a DQNbased approach was introduced to solve the challenging problem of jointly optimizing task offloading and bandwidth allocation in MEC networks. The proposed solution effectively balances tradeoffs between quality of service, energy efficiency, and network congestion.
In [11], a comprehensive review was provided of the current stateoftheart techniques and computational resources used for partitioning and offloading in MEC networks. The challenges and opportunities of various approaches were discussed and their pros and cons were highlighted. In [17], a task offloading scheme based on DQN was proposed to select the optimal edge server and transmission mode for maximizing task offloading utility. The proposed scheme achieved high performance in terms of task completion time, energy consumption, and network congestion. Liu et al. [18] proposed a novel approach for allocating resources in a mobile edge computing system, where vehicles were used as edge devices to provide computational services to nearby users. The proposed algorithm was based on deep reinforcement learning (DRL) and aimed to maximize the overall system utility by efficiently allocating resources among the vehicles. In [19], an intelligent DRLbased resource allocation scheme for wireless networks was proposed to minimize service time and balance resources.
In [20], a migration algorithm was proposed to optimize task migration using a multiagent reinforcement learning approach. The algorithm leveraged the collective intelligence of multiple agents to make optimal migration decisions, taking into account various factors such as network conditions, task requirements, and system resources.
In [21], the authors introduced a DQNbased computation offloading scheme for mobile edge computing (MEC) networks. The proposed scheme aimed to minimize the longterm cost of the system while ensuring a satisfactory level of quality of service for the endusers. By leveraging DQN, the scheme was able to learn an optimal offloading decision policy in a datadriven manner, which could adapt to the dynamic and uncertain network conditions in MEC environments. [22] introduced a DDQNbased backscatteraided hybrid data offloading scheme that significantly improved energy efficiency while maintaining transmission rate and reliability. In [23], an approach was proposed for making offloading decisions in a mobile edge computing (MEC) environment by jointly optimizing CPU frequencies and transmit powers. This approach aimed to minimize the energy consumption of mobile devices while maintaining the quality of service for offloaded tasks. The objective of [24] was to minimize the costs associated with energy consumption and computation delay in mobile edge computing networks, which are critical factors for their performance. To achieve this, the authors presented the RLSARSA algorithm as a solution for resource management and optimal offloading decisions. In [25] the authors proposed DDQNLIST, a gamelearning algorithm that combined DDQN and distributed LSTM with intermediate state transition to lower the complexity of offloading computation under timevarying conditions. DDQNLIST used distributed LSTM and doubleQ learning to improve processing and predicting time intervals and delays. The devices could exploit information asymmetry to obtain a better game learning outcome.
Policybased DRLs methods
Lu et al. [26] presented the doubledueling deterministic policy gradient (D3PG) algorithm for edge computing, capable of optimizing three critical performance metrics: service latency, energy consumption, and task success rate. The simultaneous optimization of these metrics can lead to efficient resource allocation in dynamic edge computing environments. The D3PG algorithm can improve the overall performance of edge computing systems by reducing service latency, decreasing energy consumption, and improving the task success rate.
Zhang et al. [27] proposed two DRL algorithms for dynamic computation offloading in edge computing to minimize service latency. The hybridAC algorithm optimized resource allocation in singledevice scenarios using a decisionbased actorcritic approach. The mdHybridAC algorithm achieved efficient resource allocation in multidevice scenarios using the multidevice actorcritic approach. These algorithms are significant contributions to the field of edge computing, as they reduce service latency and energy consumption, and improve the task success rate. They have the potential to enhance the performance of various edge computing applications.
Chen and Wang [28] proposed a decentralized DDPGbased mechanism called JCORA. It addresses the challenge of resource allocation in a decentralized MEC system, where multiple users compete for limited computing resources. By using DDPG agents, the JCORA mechanism can learn the optimal computation offloading policies for each user without relying on centralized control or communication. This approach offers several advantages, including improved scalability and reduced overhead [29], presented a trustworthy DRL strategy for computation offloading in IoT edge networks, designed to handle selfish or forgery attacks in intelligent vehicle networks. This strategy employs an intelligent system model and a DPGAQ scheme to anticipate untrusted vehicle attacks during IoT offloading. It evaluates device trustworthiness for vehicle networks in mobile edge networks and uses the intelligent trusted system model and DPGAQ scheme to prevent malicious vehicle attacks. The strategy also uses a quantization algorithm to simplify offloading decisions in highdimensional action spaces. In [30], a new policybased multiagent deep reinforcement learning algorithm known as postdecision state (GPDS) is introduced to address malicious interference in wireless networks. By assessing the communication quality, spectrum availability, and jammer’s strategy from the postdecision state, the mobile users can optimize their transmission power and frequency to increase their SINR and channel throughput.
Liu and Liao [31] designed actor–criticbased approach to optimize resource allocation and offloading decisions. In their approach, the authors used a hybrid policy that can handle both discrete and continuous actions. This approach allows for efficient resource allocation and computation offloading decisions in edge computing systems. The DAC algorithm is capable of dynamically adjusting the decisionmaking process according to the system's needs, ensuring that optimal resource allocation is achieved at all times. Moreover, the DAC approach is scalable and can be applied to various edge computing scenarios, making it a valuable tool for addressing the challenges of resource allocation and computation offloading in edge environments.
Qiu et al. [32] developed a distributed and collective deep reinforcement learning (DRL) algorithm, called DCDRL, to address the challenges of computation offloading in resourceintensive and deadlinesensitive applications. The algorithm is capable of optimizing resource allocation and task scheduling in a distributed manner, enabling efficient and scalable processing of complex tasks. By leveraging the collective intelligence of multiple agents, DCDRL can achieve higher performance and better task completion rates compared to traditional centralized approaches.
In [33], a DDPGbased is algorithm is proposed for collaborative computation offloading in heterogeneous edge computing. The offloading algorithm they propose is designed to work across all three edge networks, despite their heterogeneity. By doing so, they aim to improve the overall efficiency and performance of the network, by dynamically routing computation tasks to the most appropriate network based on factors such as network load, latency, and available resources.
In [34] a joint multitask offloading and resource allocation scheme is suggested in satellite IoT. It involved modeling tasks with dependencies as DAGs and using APPO to optimize offloading strategy. The proposed approach aimed to improve offloading efficiency and could have led to better performance and faster task completion.
In [35] A DDPGbased scheme was proposed that considered energy consumption and task completion for a multiuser scenario, utilizing simultaneous wireless information and power transfer technology. It formulated an optimization problem that jointly optimized task offloading ratio, uplink channel bandwidth, power split ratio, and computing resource allocation. The proposed algorithm achieved optimal energy consumption and delay and utilized an inverting gradient updatingbased dual actorcritic neural network design to improve the convergence and stability of the training process.
System model
In this section, we present the system model and the problem formulation for task offloading and resource allocation. The network model with the edgecloud system of the DDQNEC scheme is shown in Fig. 1. Our scheme involves connecting end devices such as sensors, mobile devices, and IoT devices to base stations through wireless links. The edge computing system is connected to the core cloud via the backbone network, allowing for the offloading of tasks and the utilization of available resources in the public cloud. This batch processing approach waits for a predefined number \((N)\) of task requests before determining the optimal location for each task, whether it be the edge or the cloud, taking into consideration the availability of resources and the deadline. By evaluating a batch of task requests at a time, this approach allows for better resource utilization and decisionmaking. Both bandwidth and computing resources are considered when making offloading decisions, to optimize resource usage, minimize delay, and reduce energy consumption. In the following section, we provide a detailed description of the system model, including the task, communication, and computation offloading models. Table 1 provides a list of the notations used in our models.
Task model
A task \({t}_{n}\) is represented as a tuple of four variables, \(\left({\mathcal{z}}_{n},{\mathcal{y}}_{n}, {c}_{n}, {\tau }_{n}\right), (1\le n\le N)\) where \({\mathcal{z}}_{n}\) is the input data size in bytes, \({\mathcal{y}}_{n}\) is the resultant data size, \({c}_{n}\) is the required computational resource in CPU units, and \({\tau }_{n}\) is the task latency requirement. The value of \({x}_{n}\) is either 0 or 1, representing a binary decision on whether to assign a task to the edge or the cloud.
Typically, multiple resources are required for offloading tasks; however, our scheme considers only CPU resources required for the task [36,37,38].
where \({c}_{n}\) represent the total CPU units required to process the task \({t}_{n}\), \({\mathcal{z}}_{n}\) represents the total size of input data, while \(\varsigma\) represents the computational resources required to process a single unit of data in bytes.
Wireless bandwidth model
To offload the task from the end device to the edge or cloud, the device must be connected to the nearest base station by a wireless channel. Let's \(\mathcal{B}\) the set of all base stations \(\mathcal{B}=\{{b}_{1}, {b}_{2},\dots ,{b}_{W}\}\), and each base station \({b}_{w}\) has a set of wireless channels that provides different data rates as \({\beta }_{h}^{w}\in \left\{{\beta }_{1}^{w},{\beta }_{2}^{w},{\beta }_{3}^{w},\dots ,{\beta }_{{H}_{w}}^{w}\right\}.\) Each channel serves different tasks and \({\upsigma }_{h}^{\mathrm{w}}\) represents the remaining bandwidth of each channel as \({\{\sigma }_{1}^{w},{\sigma }_{2}^{w},{\sigma }_{3}^{w},\dots ,{\sigma }_{{H}_{w}}^{w}\}\). Then at time step \(t\), the bandwidth utilization \({\mathcal{U}}_{W}\left(t\right)\) of all the base stations can be formulated as
where \(B\) represents the bandwidth of all base stations
Computational model

i.
Edge computing:
In our scheme, the set of edge servers is denoted as \(\mathcal{P}=\{\mathrm{1,2},3\dots P\}\), and \({c}_{p}\) denote the available computational capacity of edge server \(p, (p\in \mathcal{P})\). The computation time \({T}_{n}^{{Proc}_{e}}\) for task \({t}_{n}\) to compute at edge server \(p\) is given by
$${T}_{n}^{{proc}_{e}}= \frac{{c}_{n}}{{c}_{p}}$$(4)The utilization of the computational resources of the edge server at time \(t\) is represented as
$${\mathcal{U}}_{P}\left(t\right)=\frac{\sum_{p=1}^{P}({c}_{p}(t))}{{C}_{e}}$$(5)where \({C}_{e}\) denotes the total available computing capacity of all servers at the edge.

ii.
Cloud computing:
The set of cloud servers is denoted as \(\mathcal{M}=\{\mathrm{1,2},3\dots M\}\), and \({c}_{m}\) denote the available computational capacity of edge server \(m,(m\in \mathcal{M})\). The processing time \({T}_{n}^{{proc}_{c}}\) for task \({t}_{n}\) to compute it at the cloud server \(m\) is given by
$${T}_{n}^{{proc}_{c}}= \frac{{c}_{n}}{{c}_{m}}$$(6)The utilization of the computational resources of the cloud server at time \(t\) is represented as
$${\mathcal{U}}_{M}\left(t\right)=\frac{\sum_{m=1}^{M}({c}_{m}(t))}{{C}_{c}}$$(7)where \({C}_{c}\) denotes the total available computing capacity of all servers in the cloud.
Delay model
In computation offloading, tasks are sent to an edge or cloud server for processing. The process involves three types of delays: transmission delay, propagation delay, and processing delay.

i.
Transmission Time
For task \({t}_{n}\), data transmission is required in both directions: from the end device to the edge/cloud server with a data size of \({\mathcal{z}}_{n}\), and from the edge/cloud server back to the end device with a resultant data size of \({\mathcal{y}}_{n}\).
Hence, a specific amount of bandwidth \({\beta }_{h}^{w}\left(edge\right)\) or \(\beta (cloud)\) is required to fulfill the minimum latency \({\tau }_{n}\) of task \({t}_{n}\). Transmission time which needs to send data of task \({t}_{n}\) to the edge \({T}_{n}^{{trans}_{e}}\) and cloud \({T}_{n}^{{trans}_{c}}\) can be formulated as
$$T_n^{{trans}_e}=\frac{z_n}{\beta_h^w}+\frac{y_n}{\beta_h^w}$$(8)$$T_n^{{trans}_c}=T_n^{{trans}_e}+\frac{z_n}\beta+\frac{y_n}\beta$$(9) 
ii.
Propagation Time
In the given model, the propagation delay is assumed to be constant, with a value of \({T}_{n}^{{prop}_{e}}= 5ms\) for edge server and \({T}_{n}^{{prop}_{c}}= 50ms\) for the cloud server. This simplifying assumption is made for ease of calculation and analysis. The actual propagation delay may vary depending on the location of the resource.

iii.
Processing delay:
Processing delay for the task \({t}_{n}\) edge server \({T}_{n}^{{proc}_{c}}\) and cloud server \({T}_{n}^{{proc}_{c}}\) can be obtained from Eq. (4) and (6).
Therefore, the overall time for a task to be completed by an edge \({rtt}_{n}^{e}\) or cloud \({rtt}_{n}^{c}\) is the sum of the delay caused by data transmission, propagation, and processing which is represented as
$${rtt}_{n}^{e}={T}_{n}^{{trans}_{e}}+{T}_{n}^{{prop}_{e}}+{T}_{n}^{{proc}_{e}}$$(10)$${rtt}_{n}^{c}={T}_{n}^{{trans}_{c}}+{T}_{n}^{{prop}_{c}}+{T}_{n}^{{proc}_{c}}$$(11)The total resources cost \({CO}_{total}\) can be obtained by adding the total utilization of bandwidth \({CO}_{W}\), edge server CPU \({CO}_{P}\), and cloud server CPU \({CO}_{M}\) for total task offloading as follows:
$$\begin{array}{c}{CO}_{W}= {W}_{W}\bullet \mathcal{U}{}_{W}\\ {CO}_{P}= {W}_{P}\bullet \mathcal{U}{}_{P}\\ {CO}_{M}= {W}_{M}\bullet \mathcal{U}{}_{M}\end{array}$$$${CO}_{total}={CO}_{W}+ {CO}_{P}+ {CO}_{M}$$(12)where each resource (bandwidth, edge, and cloud computational resources) has been assigned a cost weight, with \({W}_{w}=1\) being assigned to bandwidth, \({W}_{p}=5\) for edge resources and \({W}_{M}=10\) for cloud computational resources. The agent learns to pick the cheapest resource for a task based on cost weights. It assigns tasks to the best location (edge or cloud) accordingly. If both resources are available, the agent assigns the task to the edge due to lower cost.
Formal problem formulation
The multiobjective problem solved in this paper is described formally as follows:
Optimization:
Equation (13) aims to achieve the optimization objective of maximizing resource utilization in both the edge and cloud, while minimizing the cost of task offloading specified in Eq. (14). For more details on resource utilization and cost, please refer to Eqs. (3), (5), (7), and (12).
Subject to the constraints:
DDQNbased task offloading and resource allocation
In this section, we introduce DQNEC, a proposed scheme that utilizes the DDQN algorithm to make optimal decisions and select the best location for task execution by analyzing the current state of the edgecloud environment. It aims to improve resource utilization and balance the tradeoff between delay and resource cost, to maximize the performance of edgecloud computing systems. This is achieved by maximizing task offloading while minimizing delay and cost as defined in Eq. (12), (13), and (14). We formulate this multiobjective problem using a Markov Decision Process (MDP).
Markov decision process
A Markov decision process (MDP) models sequential decisionmaking problems where an agent makes decisions to maximize reward. It includes elements such as agent, state, action, policy, and reward. We formulate task offloading and resource optimization problems as an MDP to find the optimal policy \({\pi }^{*}\). The policy is a mapping of states to action probabilities, represented by \(\pi (as)\) for all possible actions \(a\) for each state \(s\). RL algorithms are often used to solve MDPs, as they allow an agent to learn the optimal behavior for a given MDP through trial and error. In the DDQNbased framework, the agent observes the state \({s}_{t}\) by attracting to the edgecloud environment and taking an action \({a}_{t}\) as computing server selection via a deterministic policy and receives an immediate reward \({r}_{t}\). The agent uses the actionvalue function \(Q({s}_{t},{a}_{t})\) to update the agent policy. The goal of the agent is to maximize the longterm reward by finding an optimal resource allocation policy. In the following section, the state, action, and reward of the proposed scheme are explained in more detail.
State
The state \({s}_{t}\) includes full information on the edgecloud network. It includes the number of remaining tasks (\({N}^{t}\)), \({I}^{t}\) is from \(1\) to \(N\) (that is \({I}^{t}=t\)), which specifies the task which should be currently determined by the agent, the total remaining computational capacity of edge servers and cloud servers (\({C}_{c}+{C}_{e}\)), total remaining bandwidth at edge and cloud (\({B}_{e}+{B}_{c}\)), the number of cloudserver (\({N}^{c}\)), the remaining CPU of each server \(({\alpha }_{m})\). In addition, information on edge such as the number of edge servers (\({N}^{e}\)), and the remaining total CPU of the edge server \(({\alpha }_{{p}_{w}})\) exists. CPU (\({U}_{m}\), \({U}_{p}\)) and bandwidth information allocated to each cloud and edge is added. Finally, each task's information \({t}_{1}, {t}_{2},\dots ,{t}_{N}\) is included. State \({s}_{t}\in \mathcal{S}\) can be defined as
Action
In our model, the agent takes action by observing the current state of the environment. The goal of the agent is to make the optimal decision to maximize resource utilization and minimize the overall average service delay with the minimum rejection of tasks. Action \({a}_{t}\in \mathcal{A}\) at each time step \(t\) can be defined as the action to offload the \(t\)th task (\(1\le t\le N\)) and allocate the resources (Bandwidth and CPU) to the task for execution within the task deadline. Action \({a}_{t}\) can be defined as
where \(\eta\) represents the computation server, and \({x}_{n}\) selects the edge or cloud location for a task \({s}_{n}\), with \(\eta\) belonging to {1,2,…,P} (edge server) when \({x}_{n}=0,\) and \(\eta \in \left\{\mathrm{1,2},\dots ,M\right\}\) (cloud server) \({x}_{n}=1\). The agent will take actions based on the task offloading strategy in each time step and receive rewards from the environment in the following time step.
Reward
In RL, the agent's objective is to maximize the sum of rewards from good actions. Our reward function is designed to optimize resource utilization, minimize cost, and satisfy delay constraints. The reward \({r}_{t}\) can be calculated by the total resource utilization \(\rho \left(t\right)\) at time step \(t\) in Eq. (20), the total cost \(\sigma \left(t\right)\) at time step \(t\) in Eq. (21) and delay constraint satisfaction for the task \({s}_{t}\) at time step \(t\) in Eq. (22).
DDQN Framework for task offloading
In our model, we used the DDQN algorithm for the learning process. The DDQN algorithm is an offpolicy algorithm and is applied to environments with discrete action spaces. The learning process for DDQN is described in Algorithm 1 and also depicted in Fig. 2.
As shown in Fig. 2, the proposed learning process based on DDQN applies replay memory \(M\), which can store a set of recent experience \(\left({s}_{i},{a}_{i}, {r}_{i}, {s}_{i+1}\right)\) which an agent gathers by interacting with the environment, and then uses for DDQN learning. In particular, the system records the experience for every step. During the network training, a minibatch (size: \(b\)) is extracted from the replay memory \(M\), and the Q network can learn from the previous experience. DDQN uses two neural networks, i) the prediction network \({Q}_{\pi }(s,a\theta )\) as a function approximator to estimate the actionvalue function Eq. (15), where \(\theta\) is the weight of the neural network, ii) the target network \({\overline{Q} }_{\pi }(s,a;\overline{\theta })\) to estimate the target value \({y}_{i}\). The target network has the same structure as the prediction network. However, its weights \(\overline{\theta }\) are copied from \(\theta\) every fixed number of iterations (K) instead of every training epoch. The following Eq. (23), (24), and (25) are the main equations for calculating the loss value.
DDQN updates the Qfunction network's parameters, \(\theta\), using the loss value and stochastic gradient descent (SGD) with each minibatch.
where \(\mathrm{\alpha }\) is the learning rate.
Performance evaluation
This section evaluates the DDQNEC scheme's performance through computer simulation. The focus is on resource utilization, task acceptance ratio, task rejection ratio, and cost ratio using a simulation environment based on i910900 K CPU, 64 GB RAM, RTX 3090 GPU, Linux Ubuntu 20.04.02 LTS, Python 3.8, and PyTorch 1.9.0 to reflect realworld edgecloud computing environments and analyze and compare the results to existing methods. The DDQNEC is evaluated and compared to three heuristic methods (heuristic1, heuristic2, heuristic3) and DQNEC [9] using a simulated edgecloud environment (as shown in Fig. 2) to measure its efficiency. Heuristic1 uses FIFO for tasks and prefers edge resources if available, Heuristic2 prioritizes tasks with high resource demands, and Heuristic3 uses the 0/1 knapsack algorithm to maximize utilization as profit. We conducted tests in two different types of environments, (small and large). In both the small and large environments, the number of tasks is distributed as: 50, 100, 150, 200, 250, and 300. However, the available resources at the edge and cloud, and task requirements are different in both environments. The small environment has fewer resources, whereas the large environment has more resources.

In the small environments, the task parameters are as CPU requirement:10~20, data size:10~20, bandwidth:100*15, deadline:5~10 ms, and the available resources in the small environments are as the number of edge servers:30 with CPU capacity of 40~60, cloud servers:20 with CPU capacity of 60~80, and bandwidth: 100 Mbps.

In large environments, the task requirements are higher than the small environments as CPU:20~30, data size:20~30, bandwidth:100*30, and deadline:10~15 ms, and the edgecloud resources are, edgeservers:50 with CPU capacity of 50~80, cloud servers:30 with CPU capacity 60~100 and available bandwidth: 100 Mbps.
The simulation parameters used in our study are presented in Table 2, while Table 3 shows the configuration of the environment in which we conducted our experiments. Our scheme uses DDQN to decide whether to offload or reject a task based on resource availability and waiting tasks. It considers network information when selecting the computing server for offloading to optimize network performance. We compare our scheme's performance with heuristic algorithms in both environments in the following section.
Figure 3 presents a comprehensive comparison of the task rejection ratios of five different schemes. As the number of tasks increases, it can be observed that the rejection ratio for all five schemes also increases. However, the DDQNEC scheme exhibits a significantly lower increase when compared to the other four schemes, thereby indicating a superior performance in terms of task acceptance ratio and a lower rejection rate. The ability of DDQNEC to accept more tasks is a result of its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability. This not only saves and preserves resources for future utilization but also allows for more tasks to be accepted. A high acceptance rate is beneficial as it leads to higher resource utilization and reduces idle time in the system. As a result, DDQNEC outperforms other methods in terms of resource utilization, indicating its effectiveness in improving the proposed edgecloud system.
Figure 4 shows the comparison of the average utilization of the proposed scheme DDQNEC with four other heuristic schemes: heuristic1, heuristic2, heuristic3, and DQNEC. As the number of tasks increases, the average resource utilization for all five schemes also increases. However, it is observed that DDQNEC consistently demonstrates a higher utilization rate when compared to the other four algorithms in both environments, as depicted in Figs. 5(a) and (b). The task rejection ratio is a crucial metric that has a direct impact on resource utilization. A low task rejection ratio implies high resource utilization. DDQNEC employs a robust mechanism for selecting the best servers based on task requirements, thereby improving the efficiency of the edgecloud system. Additionally, DDQNEC makes use of intelligent resource allocation strategies, resulting in an increased acceptance rate of tasks while maintaining resource utilization. A high acceptance rate generally leads to a higher average utilization compared to cost. The results demonstrate that DDQNEC achieves a higher utilization rate than the other algorithms, thus highlighting the effectiveness of the DDQN approach in enhancing the performance of the edgecloud system.
Figure 5 presents a comprehensive comparison of the cost ratios of five different schemes as the number of tasks increases. As the number of tasks increases, it can be observed that the cost ratio for all five schemes also increases. However, the proposed scheme DDQNEC exhibits a significantly lower increase in comparison to the other four schemes, both in small and large environments, thereby indicating a superior performance in terms of cost ratio. Additionally, DDQNEC has a significantly lower task rejection rate when compared to the three heuristics and DQNEC, which implies that it accepts more tasks for offloading and increases the utilization of the edgecloud system. The key factor that enables DDQNEC to achieve this is its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability, thus minimizing the overall cost and maximizing the utilization of the edgecloud system.
Conclusion
The task offloading and resource allocation in edgecloud dynamic environments is a difficult problem. A solution is proposed by formulating it as an MDP optimization problem and using the DDQN algorithm to find an optimal solution for task offloading. The DDQNEC model uses an agent to make better decisions for end devices and offload their computationintensive and lowlatency task to the edge or cloud server. This improves the performance in terms of average cost, average utilization, and task rejection rate, and also improves resource utilization compared to other algorithms.
In the future, we aim to improve the DDQNEC scheme for task offloading and resource allocation by using advanced machine learning and AI algorithms. We will analyze the edgecloud network by considering various factors, such as the characteristics and capabilities of end devices, to optimize task offloading. Furthermore, we will explore the use of reinforcement learning techniques for managing a significant number of IoT devices with varying task requirements, with a focus on techniques suitable for continuous action spaces.
Availability of data and materials
The data used to support the findings of this study are available from the corresponding author upon request.
References
Singh A, Satapathy SC, Roy A, Gutub A (2022) AIbased mobile edge computing for IoT: applications, challenges, and future scope. Arabian J Sci Engin (AJSE) 47(8):9801–9831. https://doi.org/10.1007/s13369021063482
Dai B, Niu J, Ren T, Atiquzzaman M (2022) Towards mobilityaware computation offloading and resource allocation in endedgecloud orchestrated computing. IEEE Internet Things J 9(19):19450–62
Dai Y, Xu D, Maharjan S, Qiao G, Zhang Y (2019) Artificial intelligence empowered edge computing and caching for internet of vehicles. IEEE Wirel Commun 26(3):12–18
Rodrigues TK, Suto K, Nishiyama H, Liu J, Kato N (2019) Machine learning meets computation and communication control in evolving edge and cloud: challenges and future perspective. IEEE Commun Surv Tutor 22(1):38–67
Rodrigues TG, Suto K, Nishiyama H, Kato N, Temma K (2018) Cloudlets activation scheme for scalable mobile edge computing with transmission power control and virtual machine migration. IEEE Trans Comput 67(9):1287–1300
Zhao J, Li Q, Gong Y, Zhang K (2019) Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans Veh Technol 68(8):7944–7956
Nguyen TT, Le LB, LeTrung Q (2019) Computation offloading in MIMO based mobile edge computing systems under perfect and imperfect CSI estimation. IEEE Trans Serv Comput 14(6):2011–2025
Dai Y, Xu D, Maharjan S, Zhang Y (2018) Joint computation offloading and user association in multitask mobile edge computing. IEEE Trans Veh Technol 67(12):12313–12325
Ullah I, Lim H.K., Seok Y.J., and Han Y.H (2022) “Optimal task offloading with deep Qnetwork for edgecloud computing environment,” presented at the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), IEEE, pp. 406–411
Huang L, Feng X, Zhang C, Qian L, Wu Y (2019) Deep reinforcement learningbased joint task offloading and bandwidth allocation for multiuser mobile edge computing. Digit Commun Netw 5(1):10–17
Gu F, Niu J, Qi Z, Atiquzzaman M (2018) Partitioning and offloading in smart mobile devices for mobile cloud computing: State of the art and future directions. J Netw Comput Appl 119:83–96
Huang L, Bi S, Zhang YJA (2019) Deep reinforcement learning for online computation offloading in wireless powered mobileedge computing networks. IEEE Trans Mob Comput 19(11):2581–2593
Li J, Gao H, Lv T, Lu Y (2018) “Deep reinforcement learning based computation offloading and resource allocation for MEC”, presented at the, 2018 IEEE wireless communications and networking conference (WCNC) IEEE, pp. 1–6
Xiong X, Zheng K, Lei L, Hou L (2020) Resource allocation based on deep reinforcement learning in IoT edge computing. IEEE J Sel Areas Commun 38(6):1133–1146
Chen X, Zhang H, Wu C, Mao S, Ji Y, Bennis M (2018) Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet Things J 6(3):4005–4018
Lu H, Gu C, Luo F, Ding W, Liu X (2020) Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning. Future Gener Comput Syst 102:847–861
Zhang K, Zhu Y, Leng S, He Y, Maharjan S, Zhang Y (2019) Deep learning empowered task offloading for mobile edge computing in urban informatics. IEEE Internet Things J 6(5):7635–7647
Liu Y, Yu H, Xie S, Zhang Y (2019) Deep reinforcement learning for offloading and resource allocation in vehicle edge computing and networks. IEEE Trans Veh Technol 68(11):11158–11168
Wang J, Zhao L, Liu J, Kato N (2019) Smart resource allocation for mobile edge computing: a deep reinforcement learning approach. IEEE Trans Emerg Top Comput 9(3):1529–1541
Liu C, Tang F, Hu Y, Li K, Tang Z, Li K (2020) Distributed task migration optimization in MEC by extending multiagent deep reinforcement learning approach. IEEE Trans Parallel Distrib Syst 32(7):1603–1614
Chen X, Zhang H, Wu C, Mao S, Ji Y, Bennis M (2018) “Performance optimization in mobileedge computing via deep reinforcement learning”, presented at the, 2018 IEEE 88th Vehicular Technology Conference (VTCFall) IEEE , pp. 1–6
Xie Y, Xu Z, Xu J, Gong S, and Wang Y (2019) “Backscatteraided hybrid data offloading for mobile edge computing via deep reinforcement learning,” presented at the International Conference on Machine Learning and Intelligent Communications, Springer, pp. 525–537
Tian K, Chai H, Liu Y, Liu B (2022) Edge Intelligence empowered dynamic offloading and resource management of MEC for Smart City internet of things. Electronics 11(6):879
Alfakih T, Hassan MM, Gumaei A, Savaglio C, Fortino G (2020) Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA. IEEE Access 8:54074–54084
Chen M, Liu W, Wang T, Zhang S, Liu A (2022) A gamebased deep reinforcement learning approach for energyefficient computation in MEC systems. Knowl.Based Syst 235:107660
Lu H, He X, Du M, Ruan X, Sun Y, Wang K (2020) Edge QoE: Computation offloading with deep reinforcement learning for Internet of Things. IEEE Internet Things J 7(10):9255–9265
Chen J, Wu Z (2021) Dynamic computation offloading with energy harvesting devices: a graphbased deep reinforcement learning approach. IEEE Commun Lett 25(9):2968–2972
Chen Z, Wang X (2020) Decentralized computation offloading for multiuser mobile edge computing: a deep reinforcement learning approach. EURASIP J Wirel Commun Netw 2020(1):1–21
Chen M, Yi M, Huang M, Huang G, Ren Y, Liu A (2023) A novel deep policy gradient action quantization for trusted collaborative computation in intelligent vehicle networks. Expert Syst Appl 221:119743
Chen M et al (2022) GPDS: a multiagent deep reinforcement learning game for antijamming secure computing in MEC network. Expert Syst Appl 210:118394
Liu K.H, and Liao W (2020) “Intelligent offloading for multiaccess edge computing: A new actorcritic approach,” presented at the ICC 2020–2020 IEEE International Conference on Communications (ICC), IEEE, pp. 1–6
Qiu X, Zhang W, Chen W, Zheng Z (2020) Distributed and collective deep reinforcement learning for computation offloading: a practical perspective. IEEE Trans Parallel Distrib Syst 32(5):1085–1101
Li Y, Qi F, Wang Z, Yu X, Shao S (2020) Distributed edge computing offloading algorithm based on deep reinforcement learning. IEEE Access 8:85204–85215
Chai F, Zhang Q, Yao H, Xin X, Gao R, Guizani, M (2023) Joint multitask offloading and resource allocation for mobile edge computing systems in satellite IoT. IEEE Trans Veh Technol 1–15
Chen S, Ge X, Wang Q, Miao Y, Ruan X (2022) DDPGbased intelligent rechargeable fog computation offloading for IoT. Wirel Netw 28(7):3293–3304
Cheng M, Li J, Nazarian S (2018 ) “DRLcloud: Deep reinforcement learningbased resource provisioning and task scheduling for cloud service providers”, presented at the, 2018 23rd Asia and South pacific design automation conference (ASPDAC) IEEE ,129:134
Nath S, and Wu J (2020) “Dynamic Computation Offloading and Resource Allocation for Multiuser Mobile Edge Computing,” presented at the GLOBECOM 2020–2020 IEEE Global Communications Conference, IEEE, pp. 1–6
Chen J, Xing H, Xiao Z, Xu L, Tao T (2021) A DRL agent for jointly optimizing computation offloading and resource allocation in MEC. IEEE Internet Things J 8(24):17508–17524
Acknowledgements
This research was supported by Basic Science Research Programs through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF2023R1A2C1003143 and NRF2018R1A6A1A03025526).
Funding
This study was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant No. NRF2023R1A2C1003143 and NRF2018R1A6A1A03025526.
Author information
Authors and Affiliations
Contributions
Methodology: Ihsan Ullah; Resources: Ihsan Ullah and HyunKyo Lim; Software: HyunKyo Lim, YeongJun Seok; Supervision: YounHee Han; Writing original draft: Ihsan Ullah; Writing review editing: Ihsan Ullah, HyunKyo Lim; All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ullah, I., Lim, HK., Seok, YJ. et al. Optimizing task offloading and resource allocation in edgecloud networks: a DRL approach. J Cloud Comp 12, 112 (2023). https://doi.org/10.1186/s13677023004613
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677023004613