 Research
 Open access
 Published:
Realtime scheduling of power grid digital twin tasks in cloud via deep reinforcement learning
Journal of Cloud Computing volumeÂ 13, ArticleÂ number:Â 121 (2024)
Abstract
As energy demand continues to grow, it is crucial to integrate advanced technologies into power grids for better reliability and efficiency. Digital Twin (DT) technology plays a key role in this by using data to monitor and predict realtime operations, significantly enhancing system efficiency. However, as the power grid expands and digitization accelerates, the data generated by the grid and the DT system grows exponentially. Effectively handling this massive data is crucial for leveraging DT technology. Traditional local computing faces challenges such as limited hardware resources and slow processing speeds. A viable solution is to offload tasks to the cloud, utilizing its powerful computational capabilities to support the stable operation of the power grid. To address the need, we propose GDDRL, a task scheduling method based on Deep Reinforcement Learning (DRL). GDDRL considers the characteristics of computational tasks from the power grid and DT system and uses a DRL agent to schedule tasks in realtime across different computing nodes, optimizing for processing time and cost. We evaluate our method against several established realtime scheduling techniques, including Deep QNetwork (DQN). Our experimental results show that the GDDRL method outperforms existing strategies by reducing response time, lowering costs, and increasing success rates.
Introduction
With the rapid development of the global economy and the continuous improvement of peopleâ€™s living standards, energy demand shows a rapid growth trendÂ [1]. The rapid growth of energy demand makes the stable operation of the power grid is facing unprecedented challenges. To improve the reliability and stability of the power grid, many cuttingedge technologies (e.g., smart grid, energy internet and artificial intelligence) have been carefully developed and integrated into power grid systemsÂ [2, 3]. Among these technologies, digital twin (DT) technology has received focused attention due to its unique digital modeling and simulation capabilitiesÂ [4].
Digital Twin technology, by creating virtual replicas of realworld systems, offers unprecedented insights and tools for the design, operation, and maintenance of power grids. DT can be used for forecasting operations and maintenance, load management, equipment failure prediction, and realtime power analysisÂ [5, 6]. The key to realizing these functions is the processing of the large amount of data generated during the operation of the power grid and the DT system. For example, by recording the average operating temperature of transformer equipment in a power grid project and analyzing relevant historical data, we can estimate the transformerâ€™s average lifespan under current conditions. This offers valuable insights for operational and maintenance strategies. However, as the amount of equipment in power grid grows and data collection becomes more frequent, the volume of data increases exponentially. This renders traditional local data processing methods inadequate, posing a challenge in effectively managing the massive data.
The data produced by the power grid and its DT system necessitate extensive hardware resources for local computational processing, which incurs high computational costs and frequently falls short of realtime processing requirements. A new method for data management is essential, and the rapid advancements in cloud computing present a promising solutionÂ [7]. Specifically, tasks generated by the power grid and DT are offloaded to the cloud for processing. These computational tasks are executed using shared resources such as compute nodes, and the results are used for decision support. FigureÂ 1 shows how tasks from both the physical and virtual entities are combined to form the DT, which is then uploaded to cloud computing nodes for processing. It also illustrates the typical architecture of a cloud computing environment used for power grid task processing. However, since the available computing nodes are limited, effectively scheduling these tasks to the appropriate nodes while ensuring quality of service (QoS) is a significant challenge that needs to be addressed.
In fact, task scheduling in cloud computing is a complex challenge that has gained extensive research in the field of optimization strategies. A large number of tools and algorithms have been developed and ongoing research aims to augment traditional approachesÂ [8, 9]. However, popular strategies are primarily designed for batch tasks and cannot meet the demands of realtime workloads. With the advancement of artificial intelligence, learningbased methods have become a key research focus. Among these, reinforcement learning (RL) stands out as it can make realtime adjustments by learning from historical data and observed environmental informationÂ [10, 11]. Specifically, RL is a method for learning optimal strategies by having an agent interact with an environment, continuously trying and learning to maximize accumulated rewards. In scheduling problems, the agent can learn to dynamically allocate tasks to different resources or processing units within given constraints and objectives.
With the advances of deep learning, researchers are increasingly turning to Deep Reinforcement Learning (DRL) techniquesÂ [12, 13] to address task scheduling problems in cloud computing. Currently, cloud task scheduling using DRL combined with DT technology has been applied to various fields such as vehicle networkingÂ [14] and unmanned aerial vehiclesÂ [15], but there are few studies focused on power grids. To address the strict realtime requirements of power grid tasks and their DT systems, we propose a new scheduling method called GDDRL. Our method can handle the massive tasks generated by the power grid and its DT system, providing realtime results to support subsequent decisionmaking. Specifically, given that the relevant tasks possess varying structural types and processing modes, GDDRL utilizes a DRL agent to allocate tasks to the most suitable computing nodes. It can not only minimize task response times but also reduce overall computational costs. In general, the main contributions of our work are summarized as follows:

To address the limited research on using cloud computing to support power grids and DT systems, we propose a DRLbased task scheduling method that aims to minimize computational task response time while reducing overall computational costs.

We provide a detailed mathematical model and implementation of our approach using Double Deep QNetwork.

We compare our method with other commonly used realtime task scheduling approaches. Our experimental results show that our approach achieves better performance in terms of average task response time and success rate while significantly reducing execution costs.
The rest of this paper is organized as follows. Firstly, We review the relevant literature. Subsequently, We provide an overview of our proposed system architecture. Thereafter, We present the details of our DRL approach and evaluate the performance of our approach through experiments, followed by a conclusion summarizing our work.
Related work
DT creates precise digital representations of physical entities for purposes of simulation, analysis, and optimization. Initially introduced to support smart manufacturing within Industry 4.0 frameworks, DT leverages the synergies between information technology and physical production systemsÂ [16]. DT has been effectively employed for resource scheduling across various sectors, including smart cities, Telematics, and networks. For instance, the integration of DT techniques has been used to optimize offloading decisions and subchannel assignments, thereby improving computational rates and reducing task completion delays in these environmentsÂ [17]. In the meantime, DT has also been used to solve problems arising in modern power systems. For example, the workÂ [18] proposes a DT framework in smart grids to assess the remaining useful life of equipment. The successful application of DT in various domains also underscores their potential in revolutionizing cloud computing approaches within power grid. For example, DTenabled cloud battery management systems have been proposed to augment computational and data storage capabilities in battery systemsÂ [19]. Additionally, a DT framework has been proposed for the management of electrical devices, which ensures reliable device collaboration and efficient communication at the cloud edgeÂ [20]. DT technology has immense potential for power grid infrastructure, but managing the large volume of tasks it generates is crucial. Cloud computing is a better solution than local processing for handling these tasks. With limited computational resources, an efficient taskscheduling mechanism is essential to optimize resource use and maintain operational efficiency.
Task scheduling in the cloud is a longstanding problem and many solutions have been proposed. For example, researchers use the Whale Optimization Algorithm (WOA) to optimize task scheduling in cloud computing by improving traditional algorithmsÂ [21]. Another solution for task scheduling in cloud computing environments is the Enhanced MultiVerse Optimizer (EMVO) algorithmÂ [22], which can effectively reduce response time and improve resource utilization. Additionally, methods based on game theoryÂ [23, 24] and fuzzy logicÂ [25] have been introduced to solve multiobjective optimization problems. However, most of these methods are designed for processing batch tasks and are not suitable for realtime workloads. For power grid, the importance of realtime is selfevident, so the focus of this paper is to address realtime task scheduling in cloud environments.
With the advancement of artificial intelligence technology, learningbased techniques donâ€™t need to construct models explicitly. They simply train neural networks based on historical data. The powerful perceptual ability of Deep Neural Networks (DNN) and the decisionmaking ability of Reinforcement Learning (RL) can effectively solve optimization problemsÂ [26]. Some researchers propose a collaborative MEC intelligent task offloading framework based on DT to develop a DRLbased Intelligent Task Offloading (INTO) scheme to jointly optimize peertopeer offloading and resource allocation decisionsÂ [27]. For DTenabled edge computing vehicular networking, a DRLbased service offloading (SOL) method is proposed to compensate for the lack of vehicle computing resources inÂ [28]. The DRL framework is used to optimize energy consumption as well as load balancing by leveraging DTs to aid in the deployment of Service Function Chain (SFC) over Computational Power Network (CPN) inÂ [29]. Moreover, realtime data is collected using DT techniques and DRL algorithms are used to reduce the latency and energy consumption of task scheduling on the invehicle edge cloudÂ [30]. DRL is then used to minimize the overall task completion delay and enable resource allocation. Furthermore, some people utilize DT to achieve smart distribution network resource scheduling using deep QlearningÂ [31]. It is difficult for Qlearning to scale up to highdimensional complex tasks and Deep QNetwork (DQN)Â [32, 33] can only suffer from the problem of Qvalue overestimation.
Based on the above discussion, we summarize some of the main features of a typical task scheduling work, as shown in TableÂ 1. Specifically, most existing work schedules tasks in batch form, while GDDRL is able to schedule realtime tasks using Double Deep QNetwork (DDQN). We have applied DDQN to the task characteristics generated by the digital twin technology of the power grid, which is a novel contribution not previously reported in the literature. Moreover, few studies have considered the computational processing of tasks generated by power grid engineering and its DT system through the cloud. By scheduling appropriate compute nodes for the tasks, we reduce the response time and also reduce the computational cost. Furthermore, we have customized modifications to the DDQN algorithm based on the actual demands and characteristics of power grid scheduling, enhancing its suitability for handling the dynamic and realtime requirements of the power grid.
Architecture
In this section, we briefly describe the general system architecture of tasks generated by the power grid and DT system uploading to cloud computing nodes for processing and the associated mathematical models.
FigureÂ 2 shows the task scheduling performed in the cloud computing nodes. When a task arrives, it is first sent to the queue of the desired instance selected by the scheduler. In the queue of the computing node, the task has to wait for its turn to be executed and once the execution is complete, a new task can be assigned to the computing node. To facilitate the description of the optimization problem studied in this paper, we provide mathematical definitions of the task model, computational node model, and task processing. The notations we used are shown in TableÂ 2.
Task characteristics
With comparing tasks with other types of Digital Twin tasks in the cloud, the data generated by the power grid system comes from various sources, including sensors, measuring devices, monitoring systems, and others. Therefore, it is necessary to consider that the power grid and DT systems will generate various types of tasks such as data analysis, log processing, and image processing. Additionally, due to the highly dynamic nature of the power grid system, there is a high requirement for realtime responsiveness. Tasks need to promptly capture and reflect changes in the status of the power grid to support realtime monitoring, prediction, and decisionmaking. Consequently, we define tasks as uploadable at any time. For each uploaded task, we define the QoS time requirement. Each task is characterized as follows.
where \(D^{ID}_{i}\) is the Task id, \(D^{AT}_{i}\) is the task uploading time, \(D^{S}_{i}\) is the size of the task, \(D^{QoS}_{i}\) is the QoS requirement for task processing, and \(D^{T}_{i}\) is the type of the task (i.e., data analysis, log processing, and image processing).
Computing nodes model
Computing nodes are the basic computing units in a cloud platform, and users have the flexibility to rent and manage these nodes according to their needs. For our problem, we assume a payasyougo subscription for computing nodes. Each available computing node follows the following definition.
where \(N^{ID}_{j}\) is the id of the computing node, \(N^{T}_{j}\) is the type of the computing node (i.e., image processing, log processing, and data analysis), \(N^{C}_{j}\) is the processing capacity of the computing node. \(N^{P}_{j}\) is the price for task processing, which is related to the executing time. \(N^{I}_{j}\) is the idle time of the computing node.
Task scheduling model
We consider that as soon as the task is generated by the power grid and DT system, the scheduler sends it to the desired compute node. After scheduling, the task is added to the processing queue of the compute node containing all the tasks assigned to that compute node. This queue follows the firstcomefirstserved (FCFS) method to process the task. We assume that once a piece of task is being processed, no other task can interrupt its processing. We assume that there is no limit to the amount of tasks that can be added to the compute node queue.
In our problem, we define the response time for the task to complete processing as the sum of the time required to transform the task, the total time required to process the task, and the time the task spends in the computing nodeâ€™s waiting queue. Therefore, the task processing response time \(T_{i}\) is calculated as follows.
where \(T^{exe}_{i}\) is the time to process the task and \(T^{wait}_{i}\) is the waiting time for the task before processing. We further define the execution time \(T^{exe}\).
Where \(D^{S}_{i}\) is the size of the task, \(N^{C}_{j}\) is the processing capacity of the computing node. \(\alpha\) is a constant parameter that denotes the speedup ratio of different types of tasks processed on different types of computing nodes. The specific values are shown in the TableÂ 3.
When the task arrives and no task is being processed on the assigned computing node, it is executed immediately. Otherwise, it needs to wait for the current task processing to finish before proceeding to the next operation. Further, We define the waiting time \(T^{wait}\) as follows.
where \(N^{I}_{j}\) is the idle time of the computing node, \(D^{AT}_{i}\) is the task uploading time. Therefore, if the computing node is not idle, the waiting time for task processing is the computing nodeâ€™s idle time minus the task uploading time. Conversely, if the computing node is idle, no wait is required.
Task processing is considered successful if the QoS requirement is satisfied, so the condition for successful task processing can be defined as below.
where \(T_{i}\) is the task processing response time, \(D^{QoS}_{i}\) is the QoS requirement for task processing.
The cost of task processing depends on the execution time. The shorter the execution time of each task processing task, the lower its cost. The cost of each task processing can be defined as follows.
where \(cost_{i}\) is processing cost of the current task, \(N^{P}_{j}\) is price for task processing of the computing node, \(T^{exe}_{i}\) is the time to process the task.
Approach
We focus on utilizing the DRL approach to solve the task scheduling problem of uploading tasks to cloud nodes for processing. This section describes the fundamentals of the DRL and the framework of GDDRL.
Basics of deep reinforcement learning
Markov Decision Process and Qlearning: Markov Decision Process (MDP) is widely used to provide a mathematical frameworkÂ [37] for modeling decision making to solve stochastic sequential decision problems, in situations where an outcome is partly random and partly under the control of the decision maker. MDPs have been a useful approach for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. A Markov decision process is a 5tuple \((S, A,P_a(s, s^{\prime }), R_a(s, s^{\prime }), and\ \gamma )\). The specific definitions are as follows.

S is a finite set of states;

A is a finite set of actions;

\(P_a(s, s^{\prime })=Pr(s_{t+1}=ss_t=s, a_t=a)\)is the probability that action a in state \(s_t\) at time t will lead to state \(s_{t+1}\) at time \(t+1\);

\(R_a(s, s^{\prime })\) is the immediate reward (or expected immediate reward) received after transitioning from state \(s_t\) to state \(s_{t+1}\), due to action a;

\(\gamma \in [0, 1]\) is the discount factor, which represents the difference in importance between future rewards and present rewards.
The MDPâ€™s goal is to find an optimal policy that will maximize the expected return from a sequence of actions that leads to a sequence of states. The expected return can be defined as a policy function that calculates the sum of the discounted rewards. The goal of finitely satisfying the Markov property in reinforcement learning is to form a state transition matrix that consists of each possibility of transitioning to the next state from a given state.
QlearningÂ [38] is a modelless reinforcement learning technique, and its implementation is similar to the MDP strategy. Specifically, Qlearning can be used to find the best behavior choice for any given (finite) MDP. The working principle of Qlearning is to learn an action value function and give the final expected result. The Qvalue is the cumulative return value obtained after executing action a in state s. The algorithm constructs a Qtable to store the Qvalues, facilitating the selection of actions that optimize returns. Decisionmaking relies solely on comparing the Qvalues for each possible action in a given state s, without considering the subsequent states of s. The Qvalue is updated based on observed rewards and states, influenced by a learning rate \(\alpha\) and a discount factor \(\gamma\) The iterative process ultimately leads to the convergence of the Qvalue to Q(s,Â a), delineating the optimal action value function. The values in the Qfunction are updated using the following expression.
DQN: The traditional Qlearning algorithm works well when dealing with finite discrete state spaces, but encounters challenges in highdimensional continuous state spaces. This is because it relies on a Qtable to store the value of each stateaction pair, and in highdimensional spaces, this table can become extremely large and difficult to manage and update. DQNÂ [39] overcomes this limitation by approximating the Qvalue function using a deep neural network, where the weights of the network are denoted by \(\theta\) handles complicated decision making with large and continuous state space. DQN takes state features as raw data input and output the Q function value of each stateaction pair. It can learn mappings from states to action values without the need for a huge Qtable.
DQN uses goal networks and experience playback to improve stability and convergence. Empirical playback is a key feature of DQN that allows the algorithm to store past transitions \((s_t, a_t, r_t, s_{t+1})\) and use this data multiple times during training. This approach breaks the temporal correlation between data and makes more efficient use of the data. Fixed QtargetÂ [40] is another improvement, which involves using a fixed network to generate the Qtarget value during the update step instead of using the main network. This helps to stabilize the learning process as it reduces the correlation between targets and predictions. In the DQN algorithm, the objective function is defined as below.
DDQN: Despite its contributions to the Q value to approach the optimization goal quickly, the greedy method is also prone to overfitting. In the DQN model, the target network adopts the action selection strategy of maximizing the Q value by \(\underset{a}{\max }Q(s_{t+1}, a;\theta _{t}))\). Therefore, the algorithm model obtained by DQN probably has a large deviation, which results in overestimation. To tackle the overestimation problem, we need to change the training method of the network selffitting. Therefore, this study proposes a learning model based on DDQNÂ [41] to modify the learning method for task scheduling. By placing the calculation and action selection of the target Q value in two networks, the transfer of the maximum deviation is cut off.
In DDQN, to minimize the problem of overestimation, the computation of the target Qvalue is divided into two steps. First, the online network is used to select the best action and then the target network is used to compute the Qvalue of this action. The objective function in DDQN is adapted as below.
The agent receives the state of virtual machines and tasks from the edge scheduling environment. Then appropriate actions, the tasktocomputing node mappings, are generated by the evaluation network. Then the agent assesses whether all the tasks have been scheduled. If scheduling is complete, then the target value is \(r_{t+1}\). If not, the target network is used to calculate the target value. The gradient descent algorithm is then performed to update the evaluation network.
The proposed GDDRL framework
AlgorithmÂ 1 gives the details of the GDDRL based task scheduling cost optimization algorithm. The algorithm determines where tasks should be queued for execution. Specifically, our algorithm checks the current state of all available nodes by using qvalues to determine the choice of compute nodes. In parallel with this task scheduler, we also keep track of the task queue of the compute nodes, updating it when a new task arrives or when task processing is complete. A more detailed mathematical model defining states and rewards in detail will be mentioned in a later section.
Action Space: We assume that a fixed number of cloud computing nodes are available. These computing nodes have a task queue in which a task is allocated by the task scheduler. Once the tasks are added to the queue, they will be executed in FCFS fashion. Therefore, we define the action space as the set of all computing nodes available. Thus, our action space is defined as:
where m is the number of computing nodes available to process the task.
State Space: We assume that the new task is ready to be scheduled at time t, and we can define the state space of DDQN at time t as:
where \(S_{node}\) and \(S_{data}\) are the states of the computing nodes and the current task at time t, respectively. More specifically, the entire state space can be described as:
where \(AT_t\) and \(T_t\) denote the uploading time and type of the current task. And \(N_1^t\) signifies the wait time for the task in \(N_{th}\) computing node.
Reward function: Since the task processing process has to satisfy the QoS requirements as well as minimize the response time and cost, we consider that the effectiveness of the cost per task and the average response time affects the reward. Therefore, the reward function of DDQN is defined as:
where \(T_i\) and \(cost_i\) indicate the response time and cost of the current task, with smaller values yielding higher rewards. Moreover, \(\lambda _i\) are the tradeoff parameters utilized to coordinate the effect of cost and response time. Increasing \(\lambda _2\) emphasizes response time, while a higher \(\lambda _1\) value suits cost prioritization. Additionally, it is worth noting that \(\lambda _1 + \lambda _2= 1\), and \(\lambda _3\) is set to constant.
Model Training Period: During the model training period, decisions and outcomes from historical scheduling are leveraged to guide the underlying DNN to acquire a more accurate value function. The training procedure of the DDQN scheduling model is illustrated in AlgorithmÂ 1. Before training begins, we first initialize the parameters of the algorithm, including the exploration rate \(\epsilon\), the learning rate \(\alpha\), the discount factor \(\gamma\), the target network update frequency C, the learning start time \(\tau\), the small batch size M, and the playback memory bank D. Then, we randomly initialize the weights of the actionvalued function Q and the targetvalued function \(Q^{\prime }\).
In each step of every episode, the agent selects an action based on the current state. Specifically, the agent selects a random action with probability \(\epsilon\) which means randomly choosing one among the available computing nodes. Otherwise, it selects the action that maximizes \(Q(s, a; \theta )\). After selecting an action, the agent adds the task to the corresponding node queue, executes the selected action, observes the reward and the new state, and stores the transition information in the replay memory. If the number of steps after learning starts exceeds \(\tau\), the agent samples a minibatch of transitions from the replay memory and utilizes these transitions to perform gradient descent for updating the weights of the actionvalue function. Additionally, we periodically update the parameters of the target network to stabilize the training process and improve the performance of the algorithm.
In the algorithm, we utilize an \(\epsilon\)greedy strategy to balance exploration and exploitation. Specifically, with a probability of \(\epsilon\), a random action is chosen to explore the environment, while with the remaining probability \((1\epsilon )\), the action that maximizes the actionvalue function is selected to exploit the known information. This balance allows the algorithm to continuously explore new task scheduling strategies during the learning process and gradually iterate towards utilizing the optimal strategy. Additionally, we employ the technique of experience replay to train the agent. This involves utilizing past experiences to smooth the training process, reduce correlations between samples, and enhance the efficiency and stability of training.
Evaluation
In this section, we present the experimental evaluation of our proposed method. Firstly, the experimental setup is given, including the configuration of the experimental environment and the hyperparameters of the DRL algorithm. Then, the results for different workload scenarios are presented.
Experimental settings
We configured three groups of different types of cloud computing nodes. TableÂ 4 contains detailed information about these computing nodes. These nodes of different types have varying computational capabilities and costs.
In the experiment, we utilized the following hyperparameter settings for our method: the total number of training episodes was set to 1,000, with a target network update frequency of 20. The size of the replay buffer was set to 100,000, and the batch size was chosen as 256. The learning rate and discount factor were respectively set to 0.001 and 0.999. We compared our method against four common scheduling methods (Random, RoundRobin, Earliest, and DQN). We implemented the proposed GDDRL method using the PyTorch framework and conducted training and inference on a GPU. The hyperparameters for the GDDRL in this experiment are presented in TableÂ 5.
Experimental results
In this part, we carried out experiments with task sizes, task arrival rates, task ratios, and number of compute nodes to cover a wide range of scenarios.
Varying the Size of Tasks: In this experiment, we compared the performance of different methods in handling various average task sizes. The experiment involved setting different average task sizes from 200 to 1000. The proportions of data analysis, log processing, and image processing in the workload were set to 0.5, 0.3, and 0.2, respectively. According to the results shown in Fig.Â 3, GDDRL outperformed other algorithms in task scheduling. Specifically, regardless of task size, GDDRL exhibited lower average response times and costs, while also significantly surpassing other algorithms in success rate.
It is worth noting that although the response times of DQN and DDQN increased with the increase in task size, they still managed to maintain relatively low response times when dealing with largescale tasks. This highlights the potential of deep reinforcement learning algorithms in handling large datasets. Particularly with DDQN, its lower response time indicates effective mitigation of the overestimation problem, giving it an advantage in handling largescale tasks, and further reinforcing the performance of GDDRL.
Overall, the increase in task size leading to higher costs is expected, as handling larger tasks naturally requires more resources. Simultaneously, the decrease in success rates is also anticipated, as larger data may cause resource bottlenecks, affecting task completion rates. In this regard, GDDRL performed well, with its stable success rate surpassing other algorithms, demonstrating its reliability and efficiency in task scheduling.
Varying the Arrival Rate of Tasks: In this experiment, we compared the performance of different methods under varying task arrival rates. The experiment set different average data arrival rates ranging from 10 to 30. According to the results shown in Fig.Â 4, GDDRL outperformed other algorithms in task processing. Specifically, regardless of the average data arrival rate, GDDRL consistently exhibited lower average response time and cost, while also demonstrating significantly higher success rates compared to other algorithms.
Regarding average response time, Random and RoundRobin showed higher response times, while DQN and DDQN methods maintained relatively lower levels. Particularly, the DDQN strategy maintained lower response times even at higher arrival rates, demonstrating stable performance. In overall cost evaluation, the costs of all methods increased with the increase in arrival rates. However, DDQN exhibited better performance in cost control, with a smaller increase in costs even at high arrival rates, reflecting its advantage in resource utilization efficiency. For success rate, DDQN maintained higher success rates under all arrival rate conditions, while other methods gradually decreased with increasing arrival rates. This result suggests that DDQN has stronger stability and reliability in ensuring successful task completion. For the results, we can see that DDQN performed excellently in all performance metrics, particularly in handling tasks with a high arrival rate, effectively balancing response time, cost, and success rate. This demonstrates the superiority of our proposed GDDRL in complex data processing environments.
Varying the Ratio of Tasks: In this experiment, we evaluated the performance of different methods in handling varying task ratios. The experiment set the ratios of image processing, log processing, and data analysis to 2:1:1, 1:2:1, and 1:1:2. According to the results depicted in Fig.Â 5, the GDDRL algorithm excelled in task scheduling, maintaining lower average response times and costs regardless of the task ratios, and also significantly outperformed other methods in terms of success rate.
Regarding average response time, when the task ratio was 2:1:1, all methods exhibited relatively high average response times. However, with the 1:1:2 task ratio, the average response time decreased slightly. This is attributed to the relatively slower processing speed of data analysis tasks, leading to increased response times as their proportion grew. As for overall cost, the methods showed relatively minor variations across different task ratios, indicating our approach maintained a relatively balanced resource allocation without significant cost increases due to specific task types. These results means that task ratios influence algorithm performance, and our GDDRL method demonstrates efficient and stable performance across different task ratios, maintaining low response times and costs.
Varying the Number of Computing Nodes: In this experiment, we compared the performance of different methods in handling various tasks across different numbers of computing nodes. We set different numbers of computing nodes ranging from 3 to 24. According to the results depicted in Fig.Â 6, GDDRL excelled in task scheduling, significantly outperforming other algorithms. Specifically, GDDRL exhibited lower average response times and costs across different numbers of computing nodes, while also leading in success rate.
Response times showed a decline as the computational node count rose, with the DDQN outperforming others, particularly noticeable at higher node counts. Such observations imply that the influence of scheduling algorithms on response time becomes somewhat marginal when resources are scarce. However, the GDDRL consistently achieved the lowest average response time, signifying its superior resource utilization efficiency. With an increase in the number of computational nodes, we observed a decrease in response time and an increase in success rate. As more computational nodes imply more resources available for selection, it can enhance the efficiency and success rate of task scheduling. These results show that, across different resource configurations, GDDRL consistently outperforms other algorithms in terms of response time, cost, and success rate.
Conclusion
In this work, we propose a cloud computing architecture for power grid task processing, leveraging DT technology used in power grid operations. We also introduce GDDRL, a novel realtime optimized task scheduling model designed to improve response time and reduce costs at cloud computing nodes. By carefully considering the unique characteristics of tasks generated by the power grid and DT system, GDDRL provides a robust solution for task scheduling. Our comprehensive design and empirical evaluations demonstrate that GDDRL consistently outperforms other methods in task scheduling performance. In future work, we plan to extend GDDRL to edge cloud environments. Decentralizing task processing closer to the data source will significantly improve response time and reduce latency. These improvements will be crucial for enhancing the resilience and efficiency of power grid operations, especially in dynamic environments.
Availability of data and materials
No datasets were generated or analysed during the current study.
References
Liu J, Wang Q, Song Z, Fang F (2021) Bottlenecks and countermeasures of highpenetration renewable energy development in China. Engineering 7(11):1611â€“1622
Wang W, Liu J, Zeng D, Fang F, Niu Y (2020) Modeling and flexible load control of combined heat and power units. Appl Therm Eng 166:114624
Fang F, Zhu Z, Jin S, Hu S (2020) Twolayer game theoretic microgrid capacity optimization considering uncertainty of renewable energy. IEEE Syst J 15(3):4260â€“4271
Pan H, Dou Z, Cai Y, Li W, Lei X, Han D (2020) Digital twin and its application in power system. In: 2020 5th International Conference on Power and Renewable Energy (ICPRE).Â Shanghai, IEEE, pp 21â€“26
Liu J, Song D, Li Q, Yang J, Hu Y, Fang F, Joo YH (2023) Life cycle cost modelling and economic analysis of wind power: A state of art review. Energy Convers Manag 277:116628
Lv Y, Lv X, Fang F, Yang T, Romero CE (2020) Adaptive selective catalytic reduction model development using typical operating data in coalfired power plants. Energy 192:116589
Cheng L, Kotoulas S (2015) Efficient skew handling for outer joins in a cloud computing environment. IEEE Trans Cloud Comput 6(2):558â€“571
Mao Y, Yan W, Song Y, Zeng Y, Chen M, Cheng L, Liu Q (2022) Differentiate quality of experience scheduling for deep learning inferences with docker containers in the cloud. IEEE Trans Cloud ComputÂ 11(2):1667â€“1677
Mao Y, Fu Y, Zheng W, Cheng L, Liu Q, Tao D (2021) Speculative container scheduling for deep learning applications in a kubernetes cluster. IEEE Syst J 16(3):3770â€“3781
Liu Q, Xia T, Cheng L, Van Eijk M, Ozcelebi T, Mao Y (2021) Deep reinforcement learning for loadbalancing aware network control in iot edge systems. IEEE Trans Parallel Distrib Syst 33(6):1491â€“1502
Liu Q, Cheng L, Jia AL, Liu C (2021) Deep reinforcement learning for communication flow control in wireless mesh networks. IEEE Netw 35(2):112â€“119
Cheng L, Wang Y, Cheng F, Liu C, Zhao Z, Wang Y (2023) A deep reinforcement learningbased preemptive approach for costaware cloud job scheduling. IEEE Trans Sustain ComputÂ 9(3):422â€“432
Zhang J, Cheng L, Liu C, Zhao Z, Mao Y (2023) Costaware scheduling systems for realtime workflows in cloud: An approach based on genetic algorithm and deep reinforcement learning. Expert Syst Appl 234:120972
Chen Y, Gu W, Xu J, Zhang Y, Min G (2023) Dynamic task offloading for digital twinempowered mobile edge computing via deep reinforcement learning. China CommunÂ 20(11):164â€“175
Consul P, Budhiraja I, Garg D, Kumar N, Singh R, Almogren AS (2024) A hybrid task offloading and resource allocation approach for digital twinempowered uavassisted mec network using federated reinforcement learning for future wireless network. IEEE Trans Consum Electron
DurÃ£o LFC, Haag S, Anderl R, SchÃ¼tzer K, Zancul E (2018) Digital twin requirements in the context of industry 4.0. In: Product Lifecycle Management to Support Industry 4.0: 15th IFIP WG 5.1 International Conference, PLM 2018, Turin, Italy, July 24, 2018, Proceedings 15.Â Turin, Springer, pp 204â€“214
Jeremiah SR, Yang LT, Park JH (2024) Digital twinassisted resource allocation framework based on edge collaboration for vehicular edge computing. Futur Gener Comput Syst 150:243â€“254
Khan SA, Rehman HZU, Waqar A, Khan ZH, Hussain M, Masud U (2023) Digital twin for advanced automation of future smart grid. In: 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC).Â Jeddah, IEEE, pp 1â€“6
Li W, Rentemeister M, Badeda J, JÃ¶st D, Schulte D, Sauer DU (2020) Digital twin for battery systems: Cloud battery management system with online stateofcharge and stateofhealth estimation. J Energy Storage 30:101557
Liao H, Zhou Z, Liu N, Zhang Y, Xu G, Wang Z, Mumtaz S (2022) Cloudedgedevice collaborative reliable and communicationefficient digital twin for lowcarbon electrical equipment management. IEEE Trans Ind Inform 19(2):1715â€“1724
Chen X, Cheng L, Liu C, Liu Q, Liu J, Mao Y, Murphy J (2020) A woabased optimization approach for task scheduling in cloud computing systems. IEEE Syst J 14(3):3117â€“3128
Shukri SE, AlSayyed R, Hudaib A, Mirjalili S (2021) Enhanced multiverse optimizer for task scheduling in cloud computing environments. Expert Syst Appl 168:114230
Fang F, Wu X (2020) A winwin mode: The complementary and coexistence of 5g networks and edge computing. IEEE Internet Things J 8(6):3983â€“4003
Jin S, Wang S, Fang F (2021) Game theoretical analysis on capacity configuration for microgrid based on multiagent system. Int J Electr Power Energy Syst 125:106485
Zade BMH, Mansouri N, Javidi MM (2021) SAEA: A securityaware and energyaware task scheduling strategy by Parallel Squirrel Search Algorithm in cloud environment. Expert Syst Appl 176:114915
Cho C, Shin S, Jeon H, Yoon S (2020) Qosaware workload distribution in hierarchical edge clouds: A reinforcement learning approach. IEEE Access 8:193297â€“193313
Zhang Y, Hu J, Min G (2023) Digital twindriven intelligent task offloading for collaborative mobile edge computing. IEEE J Sel Areas Commun 41(10):3034â€“3045. https://doi.org/10.1109/JSAC.2023.3310058
Xu X, Shen B, Ding S, Srivastava G, Bilal M, Khosravi MR, Menon VG, Jan MA, Wang M (2020) Service offloading with deep qnetwork for digital twinningempowered internet of vehicles in edge computing. IEEE Trans Ind Inform 18(2):1414â€“1423
Wang K, Yuan P, Jan MA, Khan F, Gadekallu TR, Kumari S, Pan H, Liu L (2024) Digital twinassisted service function chaining in multidomain computing power networks with multiagent reinforcement learning. Futur Gener Comput Syst 158:294â€“307
Zhu L, Tan L (2024) Task offloading scheme of vehicular cloud edge computing based on digital twin and improved a3c. Internet Things 26:101192
Zhou Z, Jia Z, Liao H, Lu W, Mumtaz S, Guizani M, Tariq M (2021) Secure and latencyaware digital twin assisted resource scheduling for 5g edge computingempowered distribution grids. IEEE Trans Ind Inform 18(7):4933â€“4943
Gu Y, Cheng F, Yang L, Xu J, Chen X, Cheng L (2024) Costaware cloud workflow scheduling using drl and simulated annealing. Digit Commun Netw
Chen X, Yu Q, Dai S, Sun P, Tang H, Cheng L (2023) Deep reinforcement learning for efficient iot data compression in smart railroad management. IEEE Internet Things J
Abd Elaziz M, Attiya I (2021) An improved henry gas solubility optimization algorithm for task scheduling in cloud computing. Artif Intell Rev 54(5):3599â€“3637
Guo H, Zhou X, Wang J, Liu J, Benslimane A (2023) Intelligent task offloading and resource allocation in digital twin based aerial computing networks. IEEE J Sel Areas Commun 41(10):3095â€“3110. https://doi.org/10.1109/JSAC.2023.3310067
Ragazzini L, Negri E, Macchi M (2021) A digital twinbased predictive strategy for workload control. IFACPapersOnLine 54(1):743â€“748
Altman E (2021) Constrained Markov decision processes.Â Boca Raton, Routledge
Tong Z, Chen H, Deng X, Li K, Li K (2020) A scheduling scheme in the cloud computing environment using deep qlearning. Inf Sci 512:1170â€“1191
Shyalika C, Silva T, Karunananda A (2020) Reinforcement learning in dynamic task scheduling: A review. SN Comput Sci 1(6):306
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Humanlevel control through deep reinforcement learning. Nature 518(7540):529â€“533
VanÂ Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double qlearning. In: Proceedings of the AAAI conference on artificial intelligence, volÂ 30.Â Arizona, AAAI Press
Acknowledgements
Not applicable.
Funding
This work was funded by the State Grid Henan Electric Power Company under the grant 5217L0230009.
Author information
Authors and Affiliations
Contributions
Daokun Qi: Conceptualization, Writing  original draft. Xiaojuan Xi: Methodology, Writing review & editing. Yake Tang: Conceptualization, Methodology, Writing  review & editing. Yuesong Zhen: Methodology, Writing  review & editing. Zhenwei Guo: Methodology, Writing  review & editing.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Qi, D., Xi, X., Tang, Y. et al. Realtime scheduling of power grid digital twin tasks in cloud via deep reinforcement learning. J Cloud Comp 13, 121 (2024). https://doi.org/10.1186/s1367702400683z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1367702400683z