Skip to main content

Advances, Systems and Applications

Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach


Edge-cloud computing is an emerging approach in which tasks are offloaded from mobile devices to edge or cloud servers. However, Task offloading may result in increased energy consumption and delays, and the decision to offload the task is dependent on various factors such as time-varying radio channels, available computation resources, and the location of devices. As edge-cloud computing is a dynamic and resource-constrained environment, making optimal offloading decisions is a challenging task. This paper aims to optimize offloading and resource allocation to minimize delay and meet computation and communication needs in edge-cloud computing. The problem of optimizing task offloading in the edge-cloud computing environment is a multi-objective problem, for which we employ deep reinforcement learning to find the optimal solution. To accomplish this, we formulate the problem as a Markov decision process and use a Double Deep Q-Network (DDQN) algorithm. Our DDQN-edge-cloud (DDQNEC) scheme dynamically makes offloading decisions by analyzing resource utilization, task constraints, and the current status of the edge-cloud network. Simulation results demonstrate that DDQNEC outperforms heuristic approaches in terms of resource utilization, task offloading, and task rejection.


The advent of IoT and 5G has enabled the development of new applications in areas such as surveillance, augmented/virtual reality, and facial recognition, which heavily depend on both computational resources and data storage for optimal performance. IoT and mobile devices have limited resources so it is difficult for them to support such intelligent, delay-sensitive applications [1]. Hence, edge computing can help these devices by offloading computation-intensive tasks to the cloud. The cloud can cause delays in communication and data transfer as a result of network congestion and high usage, this can limit their use in real-time applications such as autonomous driving, advanced navigation, augmented reality (AR), and virtual reality (VR). Overall, edge-cloud computing and 5G can work together to enable new types of applications and services that require low-latency, real-time processing. The use of edge computing can reduce latency, enhance performance, and reduce costs associated with cloud computing. The edge servers can process computation-intensive tasks instead of sending them to the cloud [2].

Edge-cloud computing is a distributed computing model that utilizes both cloud computing resources and edge devices, such as servers and IoT devices, to process and transmit data. This model aims to improve processing speed decision-making and reduce latency and bandwidth usage by bringing the cloud's processing power closer to the edge where data is generated and collected. Edge and cloud computing can coexist and work together to provide the necessary resources for task execution. The integration of AI with edge-cloud computing allows for the deployment of machine learning algorithms on edge devices, providing a high level of intelligence to the edge-cloud network and enabling it to understand its environment [3, 4]. Before offloading tasks to the cloud or edge, it is essential to carefully consider the requirements and available resources. However, edge-cloud computing is a dynamic and resource-constrained environment. Hence making an optimal decision for task offloading based on the available resource is a critical issue.

Task offloading is the process of transferring a task or workload from a local device to a remote device, such as a server or cloud resource, to improve the local device's performance and efficiency. Offloading tasks can result in increased delay and energy consumption while edge servers may have limited computational capacity, which can lead to increased computing latency. It is important to consider these trade-offs before making a decision. 5G networks with high densities may also experience higher transmission delays. Cooperative task offloading is a technique used in edge-cloud networks to improve the performance of distributed systems. In the distributed approach, tasks are split among devices in the network, such as edge devices and cloud servers, to optimize resource usage and reduce the workload on individual devices. In an edge-cloud computing environment, it can be challenging to determine the optimal location for task offloading, as there are many factors to consider, including the computational capacity of edge servers, the transmission delays of networks, and the diverse requirements of end devices. Numerous research has been conducted on the topic of computation offloading in edge-cloud networks [5,6,7,8,9]. However, due to the diverse requirements of end devices and the limited information available about wireless channels, bandwidth, and computing resources in edge-cloud networks, it is challenging to design an optimal offloading strategy.

Deep Reinforcement Learning is a subset of machine learning that combines reinforcement learning and deep learning techniques to handle high-dimensional state-action spaces and accelerate training for complex decision-making tasks. Many successful applications of DRL have been demonstrated in a wide range of fields, including gaming [7], robotics [8], and networking. The application of DRL to edge-cloud-assisted networks can optimize system performance by perceiving users' mobility [10, 11]. Task offloading problems have been addressed with DRL in [9, 12, 13], and [14]. It can be able to find the best solutions to the optimization problems of time-varying and dynamic network environments. In the context of task offloading at the edge, Dueling-DQN could potentially be used to learn the optimal decision-making policy for offloading tasks to different locations in a dynamic and resource-constrained edge-cloud environment. The DDQN architecture separates the estimator into two streams, value, and advantage, and then aggregates them to make the final estimation of the Q-value. This allows for better generalization of the Q-value estimation by decoupling the estimation of the value of a state from the estimation of the advantage of taking a specific action in that state. This could involve training the DDQN to learn the optimal trade-offs between different factors, such as the task requirements, available resources, latencies, and costs, to make informed decisions about the best location for task offloading.

In this research, we propose an advanced edge-cloud computing scheme that leverages the power of Double Deep Q-Network (DDQN) to optimize the offloading of computations and the allocation of resources for the offloaded tasks. Our proposed scheme is an extension of our previous work that employed DQN [9], but it goes one step further by utilizing the DDQN algorithm to improve the decision-making process and achieve a more efficient and optimized solution for the edge-cloud computing task offloading and resource allocation problem. The goal of the proposed scheme is to minimize energy consumption while satisfying the computation and communication requirements of the offloading tasks at edge-cloud computing. DDQN enables efficient and effective offloading decisions in dynamic and resource-constrained environments, such as edge-cloud, by adapting to changes and learning from past experiences. In terms of training, DDQN separates the Q-value estimator into two streams, value and advantage, and then aggregates which improve the generalization. We present the task offloading and resource allocation problem as an MDP and use the DDQN algorithm to identify the optimal policy. Our simulation results show that the proposed method outperforms heuristic schemes and DQNEC in terms of resource utilization, maximum task offloading, and task rejection. The main contributions of our work are:

  • To optimize resource allocation for compute-intensive and delay-sensitive tasks in the edge cloud computing environment, we developed a scheme based on the double DQN. Our scheme determines the best actions to take in the current system state, ultimately improving the overall performance of the edge cloud

  • Considering the dynamics and unpredictability of edge device environments, we modeled the task offloading problem as a Markov Decision Process (MDP) and applied Double-DQN to maximize the long-term cumulative discounted reward. Our optimization objective was to maximize resource utilization, minimize task rejection, and minimize the round-trip time to complete the task within the deadline

  • By using the proposed Double-DQNEC scheme, we can reduce idle time and balance the workload of the edge-cloud computing system. The simulation results show that the DDQN-based algorithm significantly outperforms the heuristic algorithm.

The rest of the paper is organized as follows: In Sect. 2 the work-related are reviewed, especially relative to task offloading and resources allocation in the edge-cloud computing environment; Sect. 3 provides the system model and formulation of the proposed DDQNEC scheme in detail; in Sect. 4 the simulation result and comparison are discussed that validate our approach; finally, the conclusion is made in Sect. 5.

Related work

DRL learns optimal policies and makes quick decisions by interacting with a time-varying environment, which is suitable for dynamic MEC systems. There are mainly two research streams, including value-based and policy-based methods.

Value-based DRL methods

Huang et al. introduced DROO, an online offloading algorithm for wireless-powered mobile edge computing (MEC) networks, in their work published in [12]. To solve the optimization problem, the algorithm splits it into two sub-problems, namely resource allocation and offloading decision. Through evaluation and simulation, the authors observed that DROO surpassed existing methods in terms of energy efficiency, task completion time, and network congestion. This makes DROO a promising solution for enhancing the performance of wireless-powered MEC networks. Li et al. [13] applied the deep Q-network (DQN) to jointly handle computation offloading and resource allocation in multiuser MEC, minimizing the sum cost of delay and energy consumption. In [14], an improved resource allocation policy was proposed for IoT edge computing based on the deep Q-network (DQN) approach. This policy aims to improve the efficiency of resource utilization and reduce task completion delays in the system. The DQN algorithm allows the system to learn from the experience and make better decisions for resource allocation in real-time.

Chen et al. [15] proposed the Deep-SARL algorithm to address the computation offloading problem in a mobile edge computing system by formulating it as a Markov decision process. The algorithm uses deep reinforcement learning with a self-attention mechanism to learn an optimal offloading policy that maximizes the long-term utility performance. The Deep-SARL algorithm was evaluated through simulations, and the results showed that it outperformed existing offloading methods in terms of average reward and energy efficiency. The proposed algorithm can help improve the performance of mobile edge computing systems, especially for applications that require real-time processing and low latency. Lu et al. [16] proposed a DQN algorithm for large-scale heterogeneous MEC, achieving good performance without prior knowledge of environment statistics. However, value-based methods are limited in handling continuous action space, such as in the dynamic JCORA problem. In [10], a DQN-based approach was introduced to solve the challenging problem of jointly optimizing task offloading and bandwidth allocation in MEC networks. The proposed solution effectively balances trade-offs between quality of service, energy efficiency, and network congestion.

In [11], a comprehensive review was provided of the current state-of-the-art techniques and computational resources used for partitioning and offloading in MEC networks. The challenges and opportunities of various approaches were discussed and their pros and cons were highlighted. In [17], a task offloading scheme based on DQN was proposed to select the optimal edge server and transmission mode for maximizing task offloading utility. The proposed scheme achieved high performance in terms of task completion time, energy consumption, and network congestion. Liu et al. [18] proposed a novel approach for allocating resources in a mobile edge computing system, where vehicles were used as edge devices to provide computational services to nearby users. The proposed algorithm was based on deep reinforcement learning (DRL) and aimed to maximize the overall system utility by efficiently allocating resources among the vehicles. In [19], an intelligent DRL-based resource allocation scheme for wireless networks was proposed to minimize service time and balance resources.

In [20], a migration algorithm was proposed to optimize task migration using a multi-agent reinforcement learning approach. The algorithm leveraged the collective intelligence of multiple agents to make optimal migration decisions, taking into account various factors such as network conditions, task requirements, and system resources.

In [21], the authors introduced a DQN-based computation offloading scheme for mobile edge computing (MEC) networks. The proposed scheme aimed to minimize the long-term cost of the system while ensuring a satisfactory level of quality of service for the end-users. By leveraging DQN, the scheme was able to learn an optimal offloading decision policy in a data-driven manner, which could adapt to the dynamic and uncertain network conditions in MEC environments. [22] introduced a DDQN-based backscatter-aided hybrid data offloading scheme that significantly improved energy efficiency while maintaining transmission rate and reliability. In [23], an approach was proposed for making offloading decisions in a mobile edge computing (MEC) environment by jointly optimizing CPU frequencies and transmit powers. This approach aimed to minimize the energy consumption of mobile devices while maintaining the quality of service for offloaded tasks. The objective of [24] was to minimize the costs associated with energy consumption and computation delay in mobile edge computing networks, which are critical factors for their performance. To achieve this, the authors presented the RL-SARSA algorithm as a solution for resource management and optimal offloading decisions. In [25] the authors proposed DDQNL-IST, a game-learning algorithm that combined DDQN and distributed LSTM with intermediate state transition to lower the complexity of offloading computation under time-varying conditions. DDQNL-IST used distributed LSTM and double-Q learning to improve processing and predicting time intervals and delays. The devices could exploit information asymmetry to obtain a better game learning outcome.

Policy-based DRLs methods

Lu et al. [26] presented the double-dueling deterministic policy gradient (D3PG) algorithm for edge computing, capable of optimizing three critical performance metrics: service latency, energy consumption, and task success rate. The simultaneous optimization of these metrics can lead to efficient resource allocation in dynamic edge computing environments. The D3PG algorithm can improve the overall performance of edge computing systems by reducing service latency, decreasing energy consumption, and improving the task success rate.

Zhang et al. [27] proposed two DRL algorithms for dynamic computation offloading in edge computing to minimize service latency. The hybrid-AC algorithm optimized resource allocation in single-device scenarios using a decision-based actor-critic approach. The md-Hybrid-AC algorithm achieved efficient resource allocation in multi-device scenarios using the multidevice actor-critic approach. These algorithms are significant contributions to the field of edge computing, as they reduce service latency and energy consumption, and improve the task success rate. They have the potential to enhance the performance of various edge computing applications.

Chen and Wang [28] proposed a decentralized DDPG-based mechanism called JCORA. It addresses the challenge of resource allocation in a decentralized MEC system, where multiple users compete for limited computing resources. By using DDPG agents, the JCORA mechanism can learn the optimal computation offloading policies for each user without relying on centralized control or communication. This approach offers several advantages, including improved scalability and reduced overhead [29], presented a trustworthy DRL strategy for computation offloading in IoT edge networks, designed to handle selfish or forgery attacks in intelligent vehicle networks. This strategy employs an intelligent system model and a DPGAQ scheme to anticipate untrusted vehicle attacks during IoT offloading. It evaluates device trustworthiness for vehicle networks in mobile edge networks and uses the intelligent trusted system model and DPGAQ scheme to prevent malicious vehicle attacks. The strategy also uses a quantization algorithm to simplify offloading decisions in high-dimensional action spaces. In [30], a new policy-based multi-agent deep reinforcement learning algorithm known as post-decision state (GPDS) is introduced to address malicious interference in wireless networks. By assessing the communication quality, spectrum availability, and jammer’s strategy from the post-decision state, the mobile users can optimize their transmission power and frequency to increase their SINR and channel throughput.

Liu and Liao [31] designed actor–critic-based approach to optimize resource allocation and offloading decisions. In their approach, the authors used a hybrid policy that can handle both discrete and continuous actions. This approach allows for efficient resource allocation and computation offloading decisions in edge computing systems. The DAC algorithm is capable of dynamically adjusting the decision-making process according to the system's needs, ensuring that optimal resource allocation is achieved at all times. Moreover, the DAC approach is scalable and can be applied to various edge computing scenarios, making it a valuable tool for addressing the challenges of resource allocation and computation offloading in edge environments.

Qiu et al. [32] developed a distributed and collective deep reinforcement learning (DRL) algorithm, called DC-DRL, to address the challenges of computation offloading in resource-intensive and deadline-sensitive applications. The algorithm is capable of optimizing resource allocation and task scheduling in a distributed manner, enabling efficient and scalable processing of complex tasks. By leveraging the collective intelligence of multiple agents, DC-DRL can achieve higher performance and better task completion rates compared to traditional centralized approaches.

In [33], a DDPG-based is algorithm is proposed for collaborative computation offloading in heterogeneous edge computing. The offloading algorithm they propose is designed to work across all three edge networks, despite their heterogeneity. By doing so, they aim to improve the overall efficiency and performance of the network, by dynamically routing computation tasks to the most appropriate network based on factors such as network load, latency, and available resources.

In [34] a joint multi-task offloading and resource allocation scheme is suggested in satellite IoT. It involved modeling tasks with dependencies as DAGs and using A-PPO to optimize offloading strategy. The proposed approach aimed to improve offloading efficiency and could have led to better performance and faster task completion.

In [35] A DDPG-based scheme was proposed that considered energy consumption and task completion for a multiuser scenario, utilizing simultaneous wireless information and power transfer technology. It formulated an optimization problem that jointly optimized task offloading ratio, uplink channel bandwidth, power split ratio, and computing resource allocation. The proposed algorithm achieved optimal energy consumption and delay and utilized an inverting gradient updating-based dual actor-critic neural network design to improve the convergence and stability of the training process.

System model

In this section, we present the system model and the problem formulation for task offloading and resource allocation. The network model with the edge-cloud system of the DDQNEC scheme is shown in Fig. 1. Our scheme involves connecting end devices such as sensors, mobile devices, and IoT devices to base stations through wireless links. The edge computing system is connected to the core cloud via the backbone network, allowing for the offloading of tasks and the utilization of available resources in the public cloud. This batch processing approach waits for a predefined number \((N)\) of task requests before determining the optimal location for each task, whether it be the edge or the cloud, taking into consideration the availability of resources and the deadline. By evaluating a batch of task requests at a time, this approach allows for better resource utilization and decision-making. Both bandwidth and computing resources are considered when making offloading decisions, to optimize resource usage, minimize delay, and reduce energy consumption. In the following section, we provide a detailed description of the system model, including the task, communication, and computation offloading models. Table 1 provides a list of the notations used in our models.

Fig. 1
figure 1

The system model and structure of edge-cloud computing

Table 1 Notation and description

Task model

A task \({t}_{n}\) is represented as a tuple of four variables, \(\left({\mathcal{z}}_{n},{\mathcal{y}}_{n}, {c}_{n}, {\tau }_{n}\right), (1\le n\le N)\) where \({\mathcal{z}}_{n}\) is the input data size in bytes, \({\mathcal{y}}_{n}\) is the resultant data size, \({c}_{n}\) is the required computational resource in CPU units, and \({\tau }_{n}\) is the task latency requirement. The value of \({x}_{n}\) is either 0 or 1, representing a binary decision on whether to assign a task to the edge or the cloud.

$${x}_{n}=\left\{\begin{array}{c}0\,task\,\,{t}_{n}\,is\,executed\,at\,edge,\\ 1\,task\,\,{t}_{n}\,is\,executed\,at\,cloud,\end{array}\right.$$

Typically, multiple resources are required for offloading tasks; however, our scheme considers only CPU resources required for the task [36,37,38].


where \({c}_{n}\) represent the total CPU units required to process the task \({t}_{n}\), \({\mathcal{z}}_{n}\) represents the total size of input data, while \(\varsigma\) represents the computational resources required to process a single unit of data in bytes.

Wireless bandwidth model

To offload the task from the end device to the edge or cloud, the device must be connected to the nearest base station by a wireless channel. Let's \(\mathcal{B}\) the set of all base stations \(\mathcal{B}=\{{b}_{1}, {b}_{2},\dots ,{b}_{W}\}\), and each base station \({b}_{w}\) has a set of wireless channels that provides different data rates as \({\beta }_{h}^{w}\in \left\{{\beta }_{1}^{w},{\beta }_{2}^{w},{\beta }_{3}^{w},\dots ,{\beta }_{{H}_{w}}^{w}\right\}.\) Each channel serves different tasks and \({\upsigma }_{h}^{\mathrm{w}}\) represents the remaining bandwidth of each channel as \({\{\sigma }_{1}^{w},{\sigma }_{2}^{w},{\sigma }_{3}^{w},\dots ,{\sigma }_{{H}_{w}}^{w}\}\). Then at time step \(t\), the bandwidth utilization \({\mathcal{U}}_{W}\left(t\right)\) of all the base stations can be formulated as

$${\mathcal{U}}_{W}\left(t\right)=\frac{\sum_{w=1}^{W}(\sum_{h=1}^{H}{\beta }_{h}^{w})}{B}$$

where \(B\) represents the bandwidth of all base stations

Computational model

  1. i.

    Edge computing:

    In our scheme, the set of edge servers is denoted as \(\mathcal{P}=\{\mathrm{1,2},3\dots P\}\), and \({c}_{p}\) denote the available computational capacity of edge server \(p, (p\in \mathcal{P})\). The computation time \({T}_{n}^{{Proc}_{e}}\) for task \({t}_{n}\) to compute at edge server \(p\) is given by

    $${T}_{n}^{{proc}_{e}}= \frac{{c}_{n}}{{c}_{p}}$$

    The utilization of the computational resources of the edge server at time \(t\) is represented as


    where \({C}_{e}\) denotes the total available computing capacity of all servers at the edge.

  2. ii.

    Cloud computing:

    The set of cloud servers is denoted as \(\mathcal{M}=\{\mathrm{1,2},3\dots M\}\), and \({c}_{m}\) denote the available computational capacity of edge server \(m,(m\in \mathcal{M})\). The processing time \({T}_{n}^{{proc}_{c}}\) for task \({t}_{n}\) to compute it at the cloud server \(m\) is given by

    $${T}_{n}^{{proc}_{c}}= \frac{{c}_{n}}{{c}_{m}}$$

    The utilization of the computational resources of the cloud server at time \(t\) is represented as


    where \({C}_{c}\) denotes the total available computing capacity of all servers in the cloud.

Delay model

In computation offloading, tasks are sent to an edge or cloud server for processing. The process involves three types of delays: transmission delay, propagation delay, and processing delay.

  1. i.

    Transmission Time

    For task \({t}_{n}\), data transmission is required in both directions: from the end device to the edge/cloud server with a data size of \({\mathcal{z}}_{n}\), and from the edge/cloud server back to the end device with a resultant data size of \({\mathcal{y}}_{n}\).

    Hence, a specific amount of bandwidth \({\beta }_{h}^{w}\left(edge\right)\) or \(\beta (cloud)\) is required to fulfill the minimum latency \({\tau }_{n}\) of task \({t}_{n}\). Transmission time which needs to send data of task \({t}_{n}\) to the edge \({T}_{n}^{{trans}_{e}}\) and cloud \({T}_{n}^{{trans}_{c}}\) can be formulated as

  2. ii.

    Propagation Time

    In the given model, the propagation delay is assumed to be constant, with a value of \({T}_{n}^{{prop}_{e}}= 5ms\) for edge server and \({T}_{n}^{{prop}_{c}}= 50ms\) for the cloud server. This simplifying assumption is made for ease of calculation and analysis. The actual propagation delay may vary depending on the location of the resource.

  3. iii.

    Processing delay:

    Processing delay for the task \({t}_{n}\) edge server \({T}_{n}^{{proc}_{c}}\) and cloud server \({T}_{n}^{{proc}_{c}}\) can be obtained from Eq. (4) and (6).

    Therefore, the overall time for a task to be completed by an edge \({rtt}_{n}^{e}\) or cloud \({rtt}_{n}^{c}\) is the sum of the delay caused by data transmission, propagation, and processing which is represented as


    The total resources cost \({CO}_{total}\) can be obtained by adding the total utilization of bandwidth \({CO}_{W}\), edge server CPU \({CO}_{P}\), and cloud server CPU \({CO}_{M}\) for total task offloading as follows:

    $$\begin{array}{c}{CO}_{W}= {W}_{W}\bullet \mathcal{U}{}_{W}\\ {CO}_{P}= {W}_{P}\bullet \mathcal{U}{}_{P}\\ {CO}_{M}= {W}_{M}\bullet \mathcal{U}{}_{M}\end{array}$$
    $${CO}_{total}={CO}_{W}+ {CO}_{P}+ {CO}_{M}$$

    where each resource (bandwidth, edge, and cloud computational resources) has been assigned a cost weight, with \({W}_{w}=1\) being assigned to bandwidth, \({W}_{p}=5\) for edge resources and \({W}_{M}=10\) for cloud computational resources. The agent learns to pick the cheapest resource for a task based on cost weights. It assigns tasks to the best location (edge or cloud) accordingly. If both resources are available, the agent assigns the task to the edge due to lower cost.

Formal problem formulation

The multi-objective problem solved in this paper is described formally as follows:


$$minimize\,\left({CO}_{W}+ {CO}_{P}+ {CO}_{M}\right)$$

Equation (13) aims to achieve the optimization objective of maximizing resource utilization in both the edge and cloud, while minimizing the cost of task offloading specified in Eq. (14). For more details on resource utilization and cost, please refer to Eqs. (3), (5), (7), and (12).

Subject to the constraints:

$$\sum_{w=1}^{W}\sum_{h=1}^{H}{ch}_{h}^{w}\bullet {\mu }_{h}^{w}\le B$$
$$\sum_{n=1}^{N}{c}_{n}\bullet \left(1-{x}_{n}\right)\le {C}_{e}$$
$$\sum_{n=1}^{N}{c}_{n}{\bullet x}_{n}\le {C}_{c}$$
$${rtt}_{n}^{e}\bullet \left(1-{x}_{n}\right)+{rtt}_{n}^{c}{\bullet x}_{n}\le {\tau }_{n}$$

DDQN-based task offloading and resource allocation

In this section, we introduce DQNEC, a proposed scheme that utilizes the DDQN algorithm to make optimal decisions and select the best location for task execution by analyzing the current state of the edge-cloud environment. It aims to improve resource utilization and balance the trade-off between delay and resource cost, to maximize the performance of edge-cloud computing systems. This is achieved by maximizing task offloading while minimizing delay and cost as defined in Eq. (12), (13), and (14). We formulate this multi-objective problem using a Markov Decision Process (MDP).

Markov decision process

A Markov decision process (MDP) models sequential decision-making problems where an agent makes decisions to maximize reward. It includes elements such as agent, state, action, policy, and reward. We formulate task offloading and resource optimization problems as an MDP to find the optimal policy \({\pi }^{*}\). The policy is a mapping of states to action probabilities, represented by \(\pi (a|s)\) for all possible actions \(a\) for each state \(s\). RL algorithms are often used to solve MDPs, as they allow an agent to learn the optimal behavior for a given MDP through trial and error. In the DDQN-based framework, the agent observes the state \({s}_{t}\) by attracting to the edge-cloud environment and taking an action \({a}_{t}\) as computing server selection via a deterministic policy and receives an immediate reward \({r}_{t}\). The agent uses the action-value function \(Q({s}_{t},{a}_{t})\) to update the agent policy. The goal of the agent is to maximize the long-term reward by finding an optimal resource allocation policy. In the following section, the state, action, and reward of the proposed scheme are explained in more detail.


The state \({s}_{t}\) includes full information on the edge-cloud network. It includes the number of remaining tasks (\({N}^{t}\)), \({I}^{t}\) is from \(1\) to \(N\) (that is \({I}^{t}=t\)), which specifies the task which should be currently determined by the agent, the total remaining computational capacity of edge servers and cloud servers (\({C}_{c}+{C}_{e}\)), total remaining bandwidth at edge and cloud (\({B}_{e}+{B}_{c}\)), the number of cloud-server (\({N}^{c}\)), the remaining CPU of each server \(({\alpha }_{m})\). In addition, information on edge such as the number of edge servers (\({N}^{e}\)), and the remaining total CPU of the edge server \(({\alpha }_{{p}_{w}})\) exists. CPU (\({U}_{m}\), \({U}_{p}\)) and bandwidth information allocated to each cloud and edge is added. Finally, each task's information \({t}_{1}, {t}_{2},\dots ,{t}_{N}\) is included. State \({s}_{t}\in \mathcal{S}\) can be defined as

$$\begin{array}{c}\begin{array}{c}{s}_{t}=\{{N}^{t},{I}^{t}, {C}_{c}+{C}_{e}, {B}_{c}+{B}_{e},\\ {N}^{c},{C}_{c}, {B}_{c}, {\alpha }_{1},{\alpha }_{2},\dots ,{\alpha }_{m},\dots , {\alpha }_{M}, {\mathcal{U}}_{1},{\mathcal{U}}_{2},\dots ,{\mathcal{U}}_{m},\dots {\mathcal{U}}_{M},\\ {N}^{e},{C}_{e}, {B}_{e}, {\alpha }_{1},{\alpha }_{2},\dots , {\alpha }_{p},\dots ,{\alpha }_{P}, {\mathcal{U}}_{1},{\mathcal{U}}_{2},\dots ,{\mathcal{U}}_{p},\dots {\mathcal{U}}_{P},\end{array}\\ {t}_{1}, {t}_{2},\dots ,{t}_{N}\}\end{array}$$


In our model, the agent takes action by observing the current state of the environment. The goal of the agent is to make the optimal decision to maximize resource utilization and minimize the overall average service delay with the minimum rejection of tasks. Action \({a}_{t}\in \mathcal{A}\) at each time step \(t\) can be defined as the action to offload the \(t\)-th task (\(1\le t\le N\)) and allocate the resources (Bandwidth and CPU) to the task for execution within the task deadline. Action \({a}_{t}\) can be defined as

$${a}_{t}=\{\eta , {x}_{n}\}$$

where \(\eta\) represents the computation server, and \({x}_{n}\) selects the edge or cloud location for a task \({s}_{n}\), with \(\eta\) belonging to {1,2,…,P} (edge server) when \({x}_{n}=0,\) and \(\eta \in \left\{\mathrm{1,2},\dots ,M\right\}\) (cloud server) \({x}_{n}=1\). The agent will take actions based on the task offloading strategy in each time step and receive rewards from the environment in the following time step.


In RL, the agent's objective is to maximize the sum of rewards from good actions. Our reward function is designed to optimize resource utilization, minimize cost, and satisfy delay constraints. The reward \({r}_{t}\) can be calculated by the total resource utilization \(\rho \left(t\right)\) at time step \(t\) in Eq. (20), the total cost \(\sigma \left(t\right)\) at time step \(t\) in Eq. (21) and delay constraint satisfaction for the task \({s}_{t}\) at time step \(t\) in Eq. (22).

$$\rho \left(t\right)={\mathcal{U}}_{W}(t)+{\mathcal{U}}_{P}(t)+{\mathcal{U}}_{M}(t)$$
$$\sigma \left(t\right)={CO}_{W}(t)+ {CO}_{P}(t)+ {CO}_{M}(t)$$
$${r}_{t}\left({s}_{t},{a}_{t}\right)=\frac{\rho \left(t\right)}{\sigma \left(t\right)}\left[{\tau }_{t}-(rt{t}_{t}^{e}\bullet \left(1-{x}_{t}\right)+rt{t}_{t}^{c}\bullet {x}_{t})\right]$$

DDQN Framework for task offloading

In our model, we used the DDQN algorithm for the learning process. The DDQN algorithm is an off-policy algorithm and is applied to environments with discrete action spaces. The learning process for DDQN is described in Algorithm 1 and also depicted in Fig. 2.

Fig. 2
figure 2

The architecture of the DDQNEC scheme for task offloading and resource allocation

As shown in Fig. 2, the proposed learning process based on DDQN applies replay memory \(M\), which can store a set of recent experience \(\left({s}_{i},{a}_{i}, {r}_{i}, {s}_{i+1}\right)\) which an agent gathers by interacting with the environment, and then uses for DDQN learning. In particular, the system records the experience for every step. During the network training, a mini-batch (size: \(b\)) is extracted from the replay memory \(M\), and the Q network can learn from the previous experience. DDQN uses two neural networks, i) the prediction network \({Q}_{\pi }(s,a|\theta )\) as a function approximator to estimate the action-value function Eq. (15), where \(\theta\) is the weight of the neural network, ii) the target network \({\overline{Q} }_{\pi }(s,a;\overline{\theta })\) to estimate the target value \({y}_{i}\). The target network has the same structure as the prediction network. However, its weights \(\overline{\theta }\) are copied from \(\theta\) every fixed number of iterations (K) instead of every training epoch. The following Eq. (23), (24), and (25) are the main equations for calculating the loss value.

$${q}_{i}\approx {Q}_{\pi }({s}_{i},{a}_{i}|\theta )$$
$${y}_{t}=r+\gamma \times \underset{\mathrm{a}}{\mathrm{max}}{\overline{Q} }_{\pi }({s}_{t+1},a|\overline{\theta })$$
$$loss={\left({y}_{t}-{Q}_{\pi }(s,a)\right)}^{2}$$

DDQN updates the Q-function network's parameters, \(\theta\), using the loss value and stochastic gradient descent (SGD) with each mini-batch.

$$\theta = \theta -\mathrm{\alpha }{\Delta }_{\theta }\sum_{i=1}^{b}\frac{{\left({q}_{i}-{y}_{i} \right)}^{2}}{b}$$

where \(\mathrm{\alpha }\) is the learning rate.

figure a

Algorithm 1. Training Stage of the DDQNEC algorithm

Performance evaluation

This section evaluates the DDQNEC scheme's performance through computer simulation. The focus is on resource utilization, task acceptance ratio, task rejection ratio, and cost ratio using a simulation environment based on i9-10900 K CPU, 64 GB RAM, RTX 3090 GPU, Linux Ubuntu 20.04.02 LTS, Python 3.8, and PyTorch 1.9.0 to reflect real-world edge-cloud computing environments and analyze and compare the results to existing methods. The DDQNEC is evaluated and compared to three heuristic methods (heuristic1, heuristic2, heuristic3) and DQNEC [9] using a simulated edge-cloud environment (as shown in Fig. 2) to measure its efficiency. Heuristic1 uses FIFO for tasks and prefers edge resources if available, Heuristic2 prioritizes tasks with high resource demands, and Heuristic3 uses the 0/1 knapsack algorithm to maximize utilization as profit. We conducted tests in two different types of environments, (small and large). In both the small and large environments, the number of tasks is distributed as: 50, 100, 150, 200, 250, and 300. However, the available resources at the edge and cloud, and task requirements are different in both environments. The small environment has fewer resources, whereas the large environment has more resources.

  • In the small environments, the task parameters are as CPU requirement:10~20, data size:10~20, bandwidth:100*15, deadline:5~10 ms, and the available resources in the small environments are as the number of edge servers:30 with CPU capacity of 40~60, cloud servers:20 with CPU capacity of 60~80, and bandwidth: 100 Mbps.

  • In large environments, the task requirements are higher than the small environments as CPU:20~30, data size:20~30, bandwidth:100*30, and deadline:10~15 ms, and the edge-cloud resources are, edge-servers:50 with CPU capacity of 50~80, cloud servers:30 with CPU capacity 60~100 and available bandwidth: 100 Mbps.

The simulation parameters used in our study are presented in Table 2, while Table 3 shows the configuration of the environment in which we conducted our experiments. Our scheme uses DDQN to decide whether to offload or reject a task based on resource availability and waiting tasks. It considers network information when selecting the computing server for offloading to optimize network performance. We compare our scheme's performance with heuristic algorithms in both environments in the following section.

Table 2 Simulation environment configuration
Table 3 Learning parameters of the DDQNEC algorithm

Figure 3 presents a comprehensive comparison of the task rejection ratios of five different schemes. As the number of tasks increases, it can be observed that the rejection ratio for all five schemes also increases. However, the DDQNEC scheme exhibits a significantly lower increase when compared to the other four schemes, thereby indicating a superior performance in terms of task acceptance ratio and a lower rejection rate. The ability of DDQNEC to accept more tasks is a result of its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability. This not only saves and preserves resources for future utilization but also allows for more tasks to be accepted. A high acceptance rate is beneficial as it leads to higher resource utilization and reduces idle time in the system. As a result, DDQNEC outperforms other methods in terms of resource utilization, indicating its effectiveness in improving the proposed edge-cloud system.

Fig. 3
figure 3

Task rejection comparison a: Small Environment b: Large Environment

Figure 4 shows the comparison of the average utilization of the proposed scheme DDQNEC with four other heuristic schemes: heuristic1, heuristic2, heuristic3, and DQNEC. As the number of tasks increases, the average resource utilization for all five schemes also increases. However, it is observed that DDQNEC consistently demonstrates a higher utilization rate when compared to the other four algorithms in both environments, as depicted in Figs. 5(a) and (b). The task rejection ratio is a crucial metric that has a direct impact on resource utilization. A low task rejection ratio implies high resource utilization. DDQNEC employs a robust mechanism for selecting the best servers based on task requirements, thereby improving the efficiency of the edge-cloud system. Additionally, DDQNEC makes use of intelligent resource allocation strategies, resulting in an increased acceptance rate of tasks while maintaining resource utilization. A high acceptance rate generally leads to a higher average utilization compared to cost. The results demonstrate that DDQNEC achieves a higher utilization rate than the other algorithms, thus highlighting the effectiveness of the DDQN approach in enhancing the performance of the edge-cloud system.

Fig. 4
figure 4

Resource utilization comparison a: Small Environment b: Large Environment

Fig. 5
figure 5

Average cost comparison a: small environment b: large environment

Figure 5 presents a comprehensive comparison of the cost ratios of five different schemes as the number of tasks increases. As the number of tasks increases, it can be observed that the cost ratio for all five schemes also increases. However, the proposed scheme DDQNEC exhibits a significantly lower increase in comparison to the other four schemes, both in small and large environments, thereby indicating a superior performance in terms of cost ratio. Additionally, DDQNEC has a significantly lower task rejection rate when compared to the three heuristics and DQNEC, which implies that it accepts more tasks for offloading and increases the utilization of the edge-cloud system. The key factor that enables DDQNEC to achieve this is its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability, thus minimizing the overall cost and maximizing the utilization of the edge-cloud system.


The task offloading and resource allocation in edge-cloud dynamic environments is a difficult problem. A solution is proposed by formulating it as an MDP optimization problem and using the DDQN algorithm to find an optimal solution for task offloading. The DDQNEC model uses an agent to make better decisions for end devices and offload their computation-intensive and low-latency task to the edge or cloud server. This improves the performance in terms of average cost, average utilization, and task rejection rate, and also improves resource utilization compared to other algorithms.

In the future, we aim to improve the DDQNEC scheme for task offloading and resource allocation by using advanced machine learning and AI algorithms. We will analyze the edge-cloud network by considering various factors, such as the characteristics and capabilities of end devices, to optimize task offloading. Furthermore, we will explore the use of reinforcement learning techniques for managing a significant number of IoT devices with varying task requirements, with a focus on techniques suitable for continuous action spaces.

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.


  1. Singh A, Satapathy SC, Roy A, Gutub A (2022) AI-based mobile edge computing for IoT: applications, challenges, and future scope. Arabian J Sci Engin (AJSE) 47(8):9801–9831.

    Article  Google Scholar 

  2. Dai B, Niu J, Ren T, Atiquzzaman M (2022) Towards mobility-aware computation offloading and resource allocation in end-edge-cloud orchestrated computing. IEEE Internet Things J 9(19):19450–62

    Article  Google Scholar 

  3. Dai Y, Xu D, Maharjan S, Qiao G, Zhang Y (2019) Artificial intelligence empowered edge computing and caching for internet of vehicles. IEEE Wirel Commun 26(3):12–18

    Article  Google Scholar 

  4. Rodrigues TK, Suto K, Nishiyama H, Liu J, Kato N (2019) Machine learning meets computation and communication control in evolving edge and cloud: challenges and future perspective. IEEE Commun Surv Tutor 22(1):38–67

    Article  Google Scholar 

  5. Rodrigues TG, Suto K, Nishiyama H, Kato N, Temma K (2018) Cloudlets activation scheme for scalable mobile edge computing with transmission power control and virtual machine migration. IEEE Trans Comput 67(9):1287–1300

    Article  MathSciNet  Google Scholar 

  6. Zhao J, Li Q, Gong Y, Zhang K (2019) Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans Veh Technol 68(8):7944–7956

    Article  Google Scholar 

  7. Nguyen TT, Le LB, Le-Trung Q (2019) Computation offloading in MIMO based mobile edge computing systems under perfect and imperfect CSI estimation. IEEE Trans Serv Comput 14(6):2011–2025

    Google Scholar 

  8. Dai Y, Xu D, Maharjan S, Zhang Y (2018) Joint computation offloading and user association in multi-task mobile edge computing. IEEE Trans Veh Technol 67(12):12313–12325

    Article  Google Scholar 

  9. Ullah I, Lim H.-K., Seok Y.-J., and Han Y.-H (2022) “Optimal task offloading with deep Q-network for edge-cloud computing environment,” presented at the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), IEEE, pp. 406–411

  10. Huang L, Feng X, Zhang C, Qian L, Wu Y (2019) Deep reinforcement learning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing. Digit Commun Netw 5(1):10–17

    Article  Google Scholar 

  11. Gu F, Niu J, Qi Z, Atiquzzaman M (2018) Partitioning and offloading in smart mobile devices for mobile cloud computing: State of the art and future directions. J Netw Comput Appl 119:83–96

    Article  Google Scholar 

  12. Huang L, Bi S, Zhang Y-JA (2019) Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks. IEEE Trans Mob Comput 19(11):2581–2593

    Article  Google Scholar 

  13. Li J,  Gao H,  Lv T,  Lu Y (2018)  “Deep reinforcement learning based computation offloading and resource allocation for MEC”, presented at the, 2018 IEEE wireless communications and networking conference (WCNC) IEEE, pp. 1–6

  14. Xiong X, Zheng K, Lei L, Hou L (2020) Resource allocation based on deep reinforcement learning in IoT edge computing. IEEE J Sel Areas Commun 38(6):1133–1146

    Article  Google Scholar 

  15. Chen X, Zhang H, Wu C, Mao S, Ji Y, Bennis M (2018) Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet Things J 6(3):4005–4018

    Article  Google Scholar 

  16. Lu H, Gu C, Luo F, Ding W, Liu X (2020) Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning. Future Gener Comput Syst 102:847–861

    Article  Google Scholar 

  17. Zhang K, Zhu Y, Leng S, He Y, Maharjan S, Zhang Y (2019) Deep learning empowered task offloading for mobile edge computing in urban informatics. IEEE Internet Things J 6(5):7635–7647

    Article  Google Scholar 

  18. Liu Y, Yu H, Xie S, Zhang Y (2019) Deep reinforcement learning for offloading and resource allocation in vehicle edge computing and networks. IEEE Trans Veh Technol 68(11):11158–11168

    Article  Google Scholar 

  19. Wang J, Zhao L, Liu J, Kato N (2019) Smart resource allocation for mobile edge computing: a deep reinforcement learning approach. IEEE Trans Emerg Top Comput 9(3):1529–1541

    Article  Google Scholar 

  20. Liu C, Tang F, Hu Y, Li K, Tang Z, Li K (2020) Distributed task migration optimization in MEC by extending multi-agent deep reinforcement learning approach. IEEE Trans Parallel Distrib Syst 32(7):1603–1614

    Article  Google Scholar 

  21. Chen X,  Zhang H,  Wu C,  Mao S,  Ji Y,  Bennis M (2018)  “Performance optimization in mobile-edge computing via deep reinforcement learning”, presented at the, 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall) IEEE , pp. 1–6

  22. Xie Y, Xu Z, Xu J, Gong S, and Wang Y (2019) “Backscatter-aided hybrid data offloading for mobile edge computing via deep reinforcement learning,” presented at the International Conference on Machine Learning and Intelligent Communications, Springer, pp. 525–537

  23. Tian K, Chai H, Liu Y, Liu B (2022) Edge Intelligence empowered dynamic offloading and resource management of MEC for Smart City internet of things. Electronics 11(6):879

    Article  Google Scholar 

  24. Alfakih T, Hassan MM, Gumaei A, Savaglio C, Fortino G (2020) Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA. IEEE Access 8:54074–54084

    Article  Google Scholar 

  25. Chen M, Liu W, Wang T, Zhang S, Liu A (2022) A game-based deep reinforcement learning approach for energy-efficient computation in MEC systems. Knowl.-Based Syst 235:107660

    Article  Google Scholar 

  26. Lu H, He X, Du M, Ruan X, Sun Y, Wang K (2020) Edge QoE: Computation offloading with deep reinforcement learning for Internet of Things. IEEE Internet Things J 7(10):9255–9265

    Article  Google Scholar 

  27. Chen J, Wu Z (2021) Dynamic computation offloading with energy harvesting devices: a graph-based deep reinforcement learning approach. IEEE Commun Lett 25(9):2968–2972

    Article  Google Scholar 

  28. Chen Z, Wang X (2020) Decentralized computation offloading for multi-user mobile edge computing: a deep reinforcement learning approach. EURASIP J Wirel Commun Netw 2020(1):1–21

    Article  Google Scholar 

  29. Chen M, Yi M, Huang M, Huang G, Ren Y, Liu A (2023) A novel deep policy gradient action quantization for trusted collaborative computation in intelligent vehicle networks. Expert Syst Appl 221:119743

    Article  Google Scholar 

  30. Chen M et al (2022) GPDS: a multi-agent deep reinforcement learning game for anti-jamming secure computing in MEC network. Expert Syst Appl 210:118394

    Article  Google Scholar 

  31. Liu K.-H,  and Liao W (2020) “Intelligent offloading for multi-access edge computing: A new actor-critic approach,” presented at the ICC 2020–2020 IEEE International Conference on Communications (ICC), IEEE, pp. 1–6

  32. Qiu X, Zhang W, Chen W, Zheng Z (2020) Distributed and collective deep reinforcement learning for computation offloading: a practical perspective. IEEE Trans Parallel Distrib Syst 32(5):1085–1101

    Article  Google Scholar 

  33. Li Y, Qi F, Wang Z, Yu X, Shao S (2020) Distributed edge computing offloading algorithm based on deep reinforcement learning. IEEE Access 8:85204–85215

    Article  Google Scholar 

  34. Chai F, Zhang Q, Yao H, Xin X, Gao R, Guizani, M (2023) Joint multi-task offloading and resource allocation for mobile edge computing systems in satellite IoT. IEEE Trans Veh Technol 1–15

  35. Chen S, Ge X, Wang Q, Miao Y, Ruan X (2022) DDPG-based intelligent rechargeable fog computation offloading for IoT. Wirel Netw 28(7):3293–3304

    Article  Google Scholar 

  36. Cheng M,  Li J,  Nazarian S (2018 ) “DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers”, presented at the, 2018 23rd Asia and South pacific design automation conference (ASP-DAC) IEEE ,129:134

  37. Nath S,  and Wu J (2020) “Dynamic Computation Offloading and Resource Allocation for Multi-user Mobile Edge Computing,” presented at the GLOBECOM 2020–2020 IEEE Global Communications Conference, IEEE, pp. 1–6

  38. Chen J, Xing H, Xiao Z, Xu L, Tao T (2021) A DRL agent for jointly optimizing computation offloading and resource allocation in MEC. IEEE Internet Things J 8(24):17508–17524

    Article  Google Scholar 

Download references


This research was supported by Basic Science Research Programs through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2023R1A2C1003143 and NRF-2018R1A6A1A03025526).


This study was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant No. NRF-2023R1A2C1003143 and NRF-2018R1A6A1A03025526.

Author information

Authors and Affiliations



Methodology: Ihsan Ullah; Resources: Ihsan Ullah and Hyun-Kyo Lim; Software: Hyun-Kyo Lim, Yeong-Jun Seok; Supervision: Youn-Hee Han; Writing original draft: Ihsan Ullah; Writing review editing: Ihsan Ullah, Hyun-Kyo Lim; All authors read and approved the final manuscript.

Corresponding author

Correspondence to Youn-Hee Han.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullah, I., Lim, HK., Seok, YJ. et al. Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach. J Cloud Comp 12, 112 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: