 Research
 Open Access
 Published:
COCAM: a cooperative video edge caching and multicasting approach based on multiagent deep reinforcement learning in multiclouds environment
Journal of Cloud Computing volumeÂ 12, ArticleÂ number:Â 123 (2023)
Abstract
The evolution of the Internet of Things technology (IoT) has boosted the drastic increase in network traffic demand. Caching and multicasting in the multiclouds scenario are effective approaches to alleviate the backhaul burden of networks and reduce service latency. However, existing works do not jointly exploit the advantages of these two approaches. In this paper, we propose COCAM, a cooperative video edge caching and multicasting approach based on multiagent deep reinforcement learning to minimize the transmission number in the multiclouds scenario with limited storage capacity in each edge cloud. Specifically, by integrating a cooperative transmission model with the caching model, we provide a concrete formulation of the joint problem. Then, we cast this decisionmaking problem as a multiagent extension of the Markov decision process and propose a multiagent actorcritic algorithm in which each agent learns a local caching strategy and further encompasses the observations of neighboring agents as constituents of the overall state. Finally, to validate the COCAM algorithm, we conduct extensive experiments on a realworld dataset. The results show that our proposed algorithm outperforms other baseline algorithms in terms of the number of video transmissions.
Graphical Abstract
Introduction
As the Internet of Things (IoT) technology evolves, users are becoming increasingly interconnected with their electronic devices [1]. The advent of new wireless networks such as the fifthgeneration (5G) network, the proliferation of smart devices, and usersâ€™ high usage of diverse applications such as video streaming, online gaming, and virtual reality have resulted in a profound surge in video traffic. According to the Cisco report [2], studies predicted that the traffic of video types would account for 79% of all Internet traffic worldwide by 2022. The extensive prevalence of video traffic and the stringent quality of experience (QoE) requirements have put tremendous backhaul pressure on networks [3]. Therefore, the issue of minimizing the network resource consumption during transmission while simultaneously satisfying user demand has become one of the most critical concerns of network operators [4].
In traditional cloud environments, the service process requires moving data to remote data centers for centralized computing and storage. This leads to high network transmission latency, which can negatively impact the performance of mobile applications. To address this problem and provide reliable services for latencysensitive applications, researchers have explored deploying smallscale cloud servers at the edge so that these edge cloud servers can provide resources closer to edge IoT devices [5,6,7]. Edge cloud servers are equipped with finite resources and can be utilized to deliver bandwidthoptimized services at the edge, thus enabling the provision of fast and immediate services [8, 9]. The multiclouds architecture, including the remote cloud and edge clouds, is a promising paradigm to improve the QoE of users and reduce energy consumption [10, 11]. This potential stems from its ability to facilitate ubiquitous caching and efficient content delivery for end users, as highlighted by several studies.
During the content request phase, the network engages in content searching upon receiving a userâ€™s request. To alleviate traffic congestion, edge caching is an efficient manner of caching popular files on edge cloud servers closer to their requesters. It tackles the problem of which content to be cached in the edge cloud [12]. Recent scholarly investigations have substantiated the effectiveness of collaborative caching, which has attracted considerable scholarly attention. Collaborative caching works by allowing edge clouds to collectively distribute content through internal connections. Song et al. [13] presented an adaptive cooperative caching scheme that incorporates an enhanced quantum genetic algorithm to address the energydelay tradeoff problem. Zhang et al. [14] proposed a spatially cooperative caching strategy for a twotier heterogeneous network. The objective of this strategy is to minimize storage usage for duplicated content with caching while maximizing the likelihood of successful content retrieval (hit probability).
During the content delivery phase, traditional unicast mechanisms for distributing content from remote cloud to edge clouds and user ends (UEs) result in inefficient delivery. Multicasting, on the other hand, can leverage the available network bandwidth to deliver the same content to multiple receivers, benefiting from the similarity of usersâ€™ preferences for content in close geographic locations. This mechanism reduces traffic generated during delivery by delivering the requested file through a single multicast rather than multiple unicasts [15]. Significant efforts have been devoted to video coding and multicast transmission [16,17,18,19]. For instance, Guo et al. [18] proposed a layerbased multiquality multicast beamforming scheme based on scalable video coding. Wu et al. [19] designed an adaptive video streaming scheme using named data network multicast. However, these algorithms, while addressing video coding and multicast transmission, did not consider the integration of coded multicasting with caching in a cooperative environment.
Intuitively, Caching reduces latency and network bandwidth consumption by serving frequently requested content locally at the edge clouds [10, 20]. Multicasting further reduces bandwidth usage by efficiently delivering popular content to multiple users simultaneously, especially in scenarios with concurrent requests for the same content. Joint consideration of caching and multicasting can enhance the overall network performance and resource utilization by dynamically allocating caching and multicasting resources based on realtime user demand and network conditions. This adaptive strategy optimizes the content availability and delivery efficiency, leading to an improved user experience. Notably, it facilitates the deployment of various latencysensitive applications and services [21, 22]. In the context of largescale cacheenabled wireless networks, Jiang et al. [23] applied an iterative numerical algorithm to analyze and optimize caching and multicasting. Various coding multicasting mechanisms have been proposed in different scenarios [24,25,26,27]. Nevertheless, in largescale cooperative caching scenarios, finding a balance between edge caching and multicasting to improve resource efficiency remains a challenging task.
In this paper, we exploit the benefits of mobile edge caching with multicasting in the multiclouds environment to reduce network transmission consumption. We investigate the collaborative caching among different edge clouds to effectively adapt to dynamic edge environments. We propose a multiagent DRLbased approach for COoperative video CAching and Multicasting named COCAM to minimize the average transmission number, thereby enhancing video delivery efficiency. Our main contributions are summarized as follows:

We investigate the cooperative video edge caching and multicasting issue to reduce the transmission number in the multiclouds scenario. Moreover, we present the problem formulation as a multiagent Markov decision process (MDP).

A novel multiagent actorcritic algorithm is designed to address the formulated MDP. Specifically, each agent learns a local caching strategy and further encompasses the observations of neighboring agents as constituents of the overall state. Multiple agents work in collaboration to efficiently adapt to the dynamic network environment.

Extensive tracedriven simulations demonstrate that our proposed algorithm outperforms other baselines in terms of video transmission number.
The rest of this paper is organized as follows. In Related work section, we introduce the related works. System model and problem formulation section presents the system model and problem formulation. The details of the COCAM approach are presented in The COCAM approach section. We compare the experimental performance and analyze the results in Performance evaluation section. Conclusion section concludes the paper.
Related work
Caching algorithms
Edge caching stores popular content locally on edge clouds, allowing them to deliver the requested content directly to users. It significantly reduces network latency and network consumption. Li et al. [28] investigated a costeffective greedy algorithm with consideration for different video characteristics. It optimized the mobile edge cache placement problem for QoEdriven dynamic adaptive video streams. Tran et al. [29] proposed a federated collaborative caching and processing framework based on integer linear programming to accommodate adaptive bitrate video streams in mobile edge computing networks. The caching decision process in wireless communication networks can be represented as an MDP, and reinforcement learning has been commonly employed in this domain. Based on a multiagent framework, Wang et al. [30] proposed a deep actorcritic reinforcement learning algorithm to address the dynamic control of caching decisions by enabling each edge to learn an optimal policy through selfadaptation. However, the existing research primarily focused on content caching policies and did not incorporate consideration for the content delivery process.
Multicasting algorithms
Multicast transmission is extensively utilized in edge networks, demonstrating its efficacy in enhancing network performance by reducing bandwidth, routing, and cost [31]. Damera et al. [32] constructed a new feasible architectural model to transmit the required content to the user using the multicell transmission. The Signal Noise Ratio was improved using the multicell transmission. The optimized MEC scheduling algorithm showed better performance compared to the existing model. Zahoor et al. [33] proposed a suggested enhanced eMBMS network architecture to address the significant limitations of the standard eMBMS architecture, i.e., a network architecture using network function virtualization (NFV) and MEC. The proposed architecture allows the multicasting of crowdsourcing live streams. Ren et al. [15] considered the fundamental issues of NFVenabled multicast in mobile edge clouds and designed a heuristic algorithm. Qin et al. [34] studied the multicast traffic for IoT applications in edge networks under the delayoriented network slicing problem. Nevertheless, these works focused on the network architecture and multicast protocols without integration with the practical applications of the edge cloud servers.
Joint caching and multicasting algorithms
The utilization of multicast transmission at the base station, enabling concurrent servicing of distinct user requests for the identical file, is recognized as a highly efficacious approach for supporting the delivery of extensive content over wireless networks. This approach is regarded as an effective strategy in wireless communications to meet the constantly increasing demand for content transmission. MaddahAli et al. [35] used the joint encoding of multiple files and the multicasting feature of downlink channels to optimize content placement and delivery under encoded multicast. They also evaluated the caching gain and demonstrated that the joint optimization problem could improve the caching gain. Liao et al. [36] used the benefits of multicast content delivery and collaborative content sharing jointly to develop a compound caching technique (multicastaware cooperative caching). He et al. [37] designed partial caching bulk transmission and partial caching pipelined transmission to reduce the delivery latency of cacheenabled multigroup multicast networks. Somuyiwa et al. [38] combined active caching and multicast transmission to model the singleuser multirequest problem as an MDP and used a DRL approach to solve the problem. Since traditional approaches are difficult to adapt to this highly diverse and dynamic environment under multiclouds cooperative caching, we propose a COCAMbased framework to maximize the traffic consumption during the video delivery phase.
System model and problem formulation
In this section, we introduce the cooperative video edge caching and multicasting model and give concrete definitions. Then, we state the corresponding cache decisionmaking problem. For convenience, we summarize some key modeling parameters and notations in Table 1.
Network model
We consider the multiclouds system, which consists of three types of layers: the remote cloud layer, the edge cloud layer, and the UE layer. Assuming that the remote cloud provides all the requested video files \(\mathcal F=\left\{ 1,2,\cdots ,F\right\}\). Since video service generally fragments a video into equally sized chunks, we assume all files are unitsized. The set of edge cloud servers can be denoted as \(\mathcal N=\left\{ 1,2,\cdots ,N\right\}\). We denote the time slot of requests as \(\mathcal T=\left\{ 1,2,\cdots ,T\right\}\). The edge clouds receive the requests and make the caching decision at each time slot t. At each time slot t, the edge clouds receive requests and determine caching decisions. The request received by edge cloud n for file f is denoted as \(q_{t,n}^f \in \{0,1\}\), where \(q_{t,n}^f = 1\) represents a request for file f, and \(q_{t,n}^f = 0\) signifies no request for file f. A variable \(x_{t,n}^f\) is used to denote the transmission decision, i.e., whether the requested video f is transmitted from the remote cloud to the edge cloud n at time t. If no, we have \(x_{t,n}^f=0\), and \(x_{t,n}^f\in (0,1]\) otherwise. \(x_{t,n}^f=1\) means a transmission channel is fully used by edge cloud n and it occurs only under unicast conditions. Otherwise, if multiple edge cloud servers share a channel under one of the multicast conditions, we assume these edge clouds share the channel equally.
Caching model
At each time t, we assume that only one UE under the edge cloud server n will request the video. For each UE, if the requested video has been cached in the upper edge cloud, the edge cloud server can deliver it to the UE directly. Else the edge cloud server requests the file from the remote cloud.
Each edge cloud has the same maximum capacity C. We use a binary variable \(y_{t,n}^f\) to indicate whether the requested video f has been stored in edge cloud n at time t. If yes, we have \(y_{t,n}^f=1\), and 0 otherwise. Each server stores content limited to its maximum storage capacity:
After the edge cloud gets the requested video, the edge cloud will decide whether to cache the content or not. If the edge cloud storage capacity is not fully filled, we store the video directly. Otherwise, we update our caching space based on the policy.
Transmission model
The remote cloud delivers the videos to the requested edge clouds. FigureÂ 1 gives four schemes in our cooperative transmission model which are described as follows:

Localcast (LC): If the requested video has been cached in the local edge cloud server at time t, the UE can fetch it from the edge cloud directly without requesting from the remote cloud. We use \(\mathcal {N}_{LC}=\{ny_{t,n}^f=1,\forall n \in \mathcal N,\forall f \in \mathcal F\}\) to denote the set of edge clouds from which UEs can get videos at time t through LC schema without fetching from the remote cloud. We have:
$$\begin{aligned} x_{t,n}^f=0,\forall n \in \mathcal {N}_{LC}, \end{aligned}$$(2)as shown in the LC part of Fig.Â 1, \(N_1\) requests \(f_1\), \(f_1\) has been stored in \(N_1\).

Multicast (MC): If the requested video has not been cached in the edge cloud, then the edge cloud requests the file from the remote cloud. If there are other different edge clouds requesting the same video f at time t, then these edge clouds can obtain the requested video f through MC schema. We use the \(\mathcal {N}_{MC}^f =\{nq_{t,n}^{f}=1, y_{t,n}^f=0, \forall n \in \mathcal N \setminus \mathcal {N}_{LC} \}\) to denote the set of edge clouds that can use multicast transmission to obtain the requested video f. We have:
$$\begin{aligned} \sum _{n \in {\mathcal {N}_{MC}^f}}{x_{t,n}^f}=1, \end{aligned}$$(3)as shown in the MC part of Fig.Â 1, \(N_2\) and \(N_3\) simultaneously request \(f_2\) that have not been cached.

XORcast (XC): We form a special edge cloud set named exclusive OR (XOR) set where each edge cloud in the set stores the video files requested by the other edge clouds. We denote this set as:
$$\begin{aligned} \begin{array}{r} \mathcal {N}_{X C}^G=\left\{ n \mid q_{t, n}^f=1, y_{t, n}^f=0, y_{t, n^{\prime }}^f=1,\right. \\ \left. \forall n \in \mathcal {N} \backslash \left( \mathcal {N}_{L C} \cup \mathcal {N}_{M C}^f\right) , \forall n^{\prime } \in \mathcal {N}_{X C}^G \setminus n, \forall f \in G\right\} , \end{array} \end{aligned}$$(4)where the video set through the XC scheme can be denoted as:
$$\begin{aligned} \begin{array}{r} G=\left\{ f \mid q_{t, n}^f=1, y_{t, n}^f=0, y_{t, n}^{f^{\prime }}=1,\right. \\ \left. \forall n \in \mathcal {N} \backslash \left( \mathcal {N}_{L C} \cup \mathcal {N}_{M C}^f\right) , \forall f^{\prime } \in G\setminus f\right\} . \end{array} \end{aligned}$$(5)The XOR set receives the XORencoded bit stream by one transmission. Then, each edge cloud restores its video by decoding the received bit stream with the contents stored in its cache. We have:
$$\begin{aligned} \sum _{n \in \mathcal {N}_{X C}^F} \sum _{f \in F} x_{t, n}^f=1, \end{aligned}$$(6)as shown in the XC part of Fig.Â 1, \(N_4\) and \(N_5\) simultaneously request \(f_5\) and \(f_4\) that have been cached not by themselves but by each other. We denote the coded XOR information as f. If there are multiple XC combinations, we choose the combination that will generate the smallest number of XC sets with the participation of the same number of edge clouds. This preference is based on the effectiveness of our proposed XC approach in significantly reducing internal energy consumption during unicast transmission. While this paper does not explicitly consider the energy consumption associated with XOR operations, it is important to acknowledge that such operations still entail a nonnegligible energy overhead. Considering a fixed number of edge clouds, our objective is to minimize the number of XC combinations to mitigate the impact of XOR energy consumption.

Unicast (UC): When the relationship between the requests from edge clouds and the cache list does not satisfy any of the above cases, edge clouds fetch videos directly from the remote cloud by establishing a transmission channel. We denote the UC set as \(\mathcal {N}_{U C}=\left\{ N \backslash \left( \mathcal {N}_{L C} \cup \mathcal {N}_{M C}^f \cup \mathcal {N}_{X C}^F\right) \right\}\). We have:
$$\begin{aligned} {x_{t,n}^f}=1,\forall n \in \mathcal {N}_{UC}. \end{aligned}$$(7)as shown in the UC part of Fig.Â 1, the edge cloud gets content from the remote cloud.
To use fewer transmissions to deliver all the data during the delivery process, we use the network coding technique. The transmitted content is encoded at the network nodes and then decoded at the destination. We use XOR coding techniques. These edge clouds have not cached the requested video but have cached the video requested by other edge clouds. The caching policy determines what will be cached in the edge cloud, and then the remote cloud classifies the transmission based on the cache state in the edge clouds. According to the above four cases, we can formulate the joint multicast transmission and cache replacement problem that aims to minimize the total number of transmissions from the remote cloud to the edge cloud as:
The COCAM approach
Our modeling problem is a mixed integer programming (MIP) problem [22], which is strictly NPhard. Solving MIP problems with traditional computational methods has been proven challenging in natural caching systems with low computational efficiency. Thus we consider a learning approach. We explore the collaboration between different edge cloud servers with a multiagent reinforcement learningbased algorithm to better adapt to dynamic edge environments.
In this section, each edge cloud operates as an independent agent, while maintaining a cooperative relationship with other edge clouds. We model the cache decisionmaking problem as a multiagent extension of the Markov Decision Process (MDP) and introduce a novel multiagent actorcriticbased caching approach. Our proposed approach aims to minimize the average number of transmissions during the request transmission process. Multiagent reinforcement learning consists of agents and the environment. Based on the state and the reward from the environment, each agent executes an action according to its certain strategy. Then the environment changes to a new state. An MDP is a mathematical framework for modeling sequential decisionmaking consisting of state, action, transition probability, and reward. Each agent learns the optimal decisionmaking sequence through continuous interaction with the environment. We define the basic elements of a multiagent MDP as follows:
State
The state of agent n at time t be denoted as \({{s}_{t,n}}=\{{y_{t,n}},q_{t,n}^{f}\}\), where \(q_{t,n}^{f}\) indicates the current request demands and \(y_{t,n}=\{y_{t,n}^f\}_{\forall f \in \mathcal {F}}\) denotes the caching state of edge cloud n. We define the neighborhoods that can be observed by the agent n as \(\mathcal N_n\). We use \(\pi _{t,n}\) to denote the policy of agent n. Thus, the adjacent agent policy of agent n can be denoted as \(\pi _{t,\mathcal N_n}\). Each agent can observe the states and policies of the neighborhoods. Therefore, the joint state of an agent n to be fed into the input network is \(\hat{s}_{t,n} =\{{s}_{t,m}\}_{\forall m \in \{n,\mathcal N_n\} }\).
Action
An agent decides which video should be replaced from the cache list based on its policy. We denote the action of agent n as \({a}_{t,n}=v\), where \(v\in \{0,1,2,\cdots ,C\}\). If \(v=0\), the requested video will not be cached. Else, the vth content in the cache space of edge cloud n will be replaced by the current requested video.
Reward
The goal is to minimize the average transmission number. We define the negative value of the transmission number as the reward:
So the global reward is calculated as:
Network architecture
As shown in Fig.Â 2, each agent consists of two parts: actor network (as \(\theta\)) and critic network (as \(\omega\)). The actor network and critic network are essential components of a policy network. The actor network receives environmental states as input and generates corresponding action outputs, aiming to learn an optimal policy \(\pi _{\theta _n}\) that maximizes the expected return or value function associated with accumulated rewards. On the other hand, the critic network serves as a value function estimation network, evaluating the quality of actions chosen by the actor network within a given state. Its primary objective is to learn a value function \(V_{\omega _n}\) capable of estimating the expected return or value based on the current state and the actions selected by the actor network. The actor network consists of two fully connected hidden layers with ReLU activation functions, where the dimensions are determined by the variable state size of the cache. Its output layer is a fully connected layer utilizing a hyperbolic tangent (tanh) activation function. Similarly, the critic network shares the same architecture as the actor network, comprising two fully connected hidden layers with ReLU activation functions. The critic networkâ€™s output layer consists of a single unit activated by a linear function. Each network contains a target network and a primary network of the same network structure. We use the target network to improve the stability and convergence of training. After the primary network learns a certain number of times, the parameters of the primary network are used to update the parameters of the target network.
Agents get the policies \(\pi\) based on their actor networks. The actor network is denoted as a function to seek optimal policy \(\pi _{t,n}=\pi _{\theta _n}(a_{t,n}\hat{s}_{t,n},\pi _{t1,\mathcal {N}_n})\), where \(\theta _{n}\) denotes the actor network parameter of agent n. An agent gets the action by random sampling with the policy distribution. We denote the parameter of the critic network for agent n as \({\omega }_{n}\). Thus, \(V_{{\omega }_{n}}\) denotes the value function of the critic network trained as an estimate of the expected reward.
We formulate the expected value equation for edge cloud n as:
where \(\gamma\) denotes the discount reward factor. At each time t, the agent stores the experience \((\hat{s}_{t,n},{a}_{t,n},{r}_{t,n}, \hat{s}_{t+1,n})\) in replay memory B.
We use the temporal difference (TD) algorithm to update the critic network. The loss function of the critic network can be calculated as:
The actor network is updated by the policy gradient (PG) algorithm. The loss function of the actor network can be defined as:
where the \(\beta\) denotes a hyperparameter to control the entropy term, and the advantage function \(\tilde{A}_{t, n}= R_{t,n}V_{w_{n}}\left( \hat{s}_{t,n}, \pi _{t1,\mathcal {N}_n}\right)\) is the discounted reward minus a baseline.
Then we update the target network parameters for each agent n as:
where \(\zeta\) denotes the target network update parameter. The target network is updated every \(\mathcal {\tau }\) step.
After the training is completed, each agent can get the most effective action strategy in the current state according to its own state in each execution step.
The COCAM algorithm is given in Algorithm 1. Each local agent collects the experience tuple by following the current policy until enough samples are collected for batch updating (lines 8 to 15). Then a batch will be sampled randomly to update the actor and the critic network (lines 17 to 20). For every \(\mathcal {\tau }\) step, the target network is updated (lines 21 to 23).
Performance evaluation
Experiment setup
We conduct experiments on a realworld dataset from iQIYI which contains 300,000 individual videos watched by 2 million users over two weeks. We randomly select 10,000 records from it. FigureÂ 3 illustrates a descending order trend in video request preferences observed in the iQIYI dataset. The popularity distribution of videos exhibits notable skewness, adhering to a Zipf distribution. This implies that a small subset of highly popular videos significantly contributes to the majority of access volume, while a large number of other videos receive minimal attention. Popular videos are frequently accessed, necessitating regular updates to their cached content. Conversely, a substantial proportion of less popular videos are rarely accessed, rendering them ineffective for caching purposes. However, despite their limited popularity, these less popular videos still contribute to usersâ€™ demand. Therefore, it becomes imperative to design an adaptive cooperative caching and multicasting strategy that captures the distribution and dynamics in video popularity. We divide the dataset into 30 edge areas based on geographic information with the Kmeans algorithm [39]. We select 20 to deploy edge cloud servers (i.e., agents) to provide the video service for users. By default, we set the cache size to 50. We assume that each agent can observe the states of all the other agents from the environment. The key experimental parameters are listed in Table 2.
Comparisons and results
The contents in different edge cloud servers are related to each other in multicast delivery, leading to the tendency of multicast caches to store similar contents. In contrast, for cooperative caching, the cached contents in different units should be mutually exclusive for better utilization of the limited storage space. The combination is balanced by using multiagent reinforcement learning in the combination.
FigureÂ 4 shows the variation of transmission number of COCAM with the increasing training episode. We can see that the transmission number decreases linearly at the beginning and steadily converges at around 150 episodes.
According to Eq. (8), we measure the performance of our proposed algorithm using the average number of transmissions during the entire process of requesting. The average number of transmissions can show the efficiency of multicast transmission, which is affected by the caching decision. A lower average number of transmissions means fewer channel resources and higher multicast transmission efficiency for the same request, which can effectively relieve network transmission pressure.
To evaluate the performance of the COCAM algorithm, we compare it with other algorithms in different cases.
Comparison with noncooperative caching algorithms
In Fig.Â 5, we compare the COCAM algorithm with noncooperative caching algorithms under cooperative transmission in terms of the number of transmissions. LRU [40]: The new content will replace the cached content which has been least recently requested. LFU [40]: The new content will replace the cached content which has been least frequently requested. FIFO [41]: The new content will replace the cached content which has been stored earliest. Lecar [42]: It adopts LRU or LFU algorithm to update the cache according to the weight adaption by regret minimization technique. Arc [43]: It dynamically adjusts the size of the two queues and performs cache updates based on the LRU algorithm.
In these caching algorithms, each edge cloud server individually caches the content based on its caching decision without combining the cooperative caching among the edge clouds.
FigureÂ 5a shows the comparison result under different numbers of requests. The request numbers are set ranging from 300 to 1500. Our COCAM algorithm performs better than the other baselines, with an average improvement of 2% to 15% in global benefits. Besides, the variations in the number of requests hardly affect the performance except for the LRU algorithm. It is because LRU works better for popular content and tends to lead to cache pollution in smooth datasets. FigureÂ 5b shows the performance comparison under different edge cloud cache sizes. The cache size ranges from 30 to 90.
From Fig.Â 5b, it is observed that the transmission number decrease as the edge cloud cache sizes increase for all methods. Since the requested videos are more likely to be hit locally or built a multicast transmission as the cache capacity increases, our COCAM algorithm performs better than the other baselines.
FigureÂ 5c shows the comparison under different amounts of edge clouds. We set the edge cloud server numbers ranging from 5 to 25 with a cache size of 50. We compare the results after 1500 requests. We can see that COCAM achieves the minimum transmission number. The performance of our algorithm is significantly better than other algorithms and stabilizes when there are fewer edge clouds.
Comparison with cooperative caching algorithms
In Fig.Â 6, we compare the COCAM algorithm with A2C algorithms that apply cooperative caching under cooperative transmission. A2C [44]: This algorithm uses a singleagent advantage actorcritic algorithm to select the action with the best reward.
As seen in the figure, the two cooperative caching algorithm curves converge, with the COCAM significantly outperforming the A2C algorithm. Compared to the A2C algorithm, our proposed algorithm results in an average improvement of 4\(\%\). It is mainly because COCAM yields more intelligent decisionmaking that learns the dynamic request pattern based on the global state. The performance of the learningbased algorithm can adapt well to the multicast environment and is not significantly affected by the variations in the number of edge clouds. With the cooperation of different agents, COCAM shows better and more stable performance.
Comparison of algorithms with different multicasting schemes
FigureÂ 7 shows the performance of multicast transmission and coding transmission during the delivery phase in the cooperative caching scenario. COCAMw/oMC &XC: We design the COCAMw/oMC &XC by using COCAM without using the part of MC and XC. COCAMw/oXC: We design the COCAMw/oXC by using COCAM without using the part of XC.
The experimental results illustrate that our proposed COCAM algorithm works better than the designaltered COCAM algorithms. It shows that our proposed MC and XC schemes effectively reduce the transmission number. As shown in the figure, the two altered algorithm curves are closer in results, indicating MC scheme is less effective on this dataset. This phenomenon can be attributed to the observation that users within the same region tend to have similar request preferences, while their activities of accessing the same content may vary across different time slots. It indicates that our XC scheme can effectively leverage this insight to achieve superior performance in the integrated caching and multicasting scenario.
Conclusion
In this paper, we have proposed a joint cache replacement and multicast transmission strategy in the multiclouds scenario. This strategy could reduce the transmission number efficiently for video delivery. We have designed a multiagent actorcritic algorithm named COCAM, enabling multiple edge clouds to cooperate to achieve intelligent caching decisions. In addition, we have conducted experiments on a realworld dataset. The evaluation results have shown that our COCAM algorithm could reduce the average transmission number by cooperation between different agents compared to other baselines. In our future work, we will further enhance reinforcement learning algorithms to achieve improved adaptation in resourceconstrained and bandwidthlimited multiclouds environment at a large scale.
Availability of data and materials
We are actively disclosing data with data providers due to signed confidentiality agreements. The motion model parameters we used are listed in the manuscript.
Abbreviations
 IoT:

Internet of Things
 QoE:

Quality of experience
 UE:

User end
 MEC:

Mobile edge computing
 NFV:

Network function virtualization
 MDP:

Markov decision process
 DRL:

Deep reinforcement learning
 MIP:

Mixed integer programming
 TD:

Temporal difference
 PG:

Policy gradient
References
Chen Y, Hu J, Zhao J, Min G (2023) QoSaware computation offloading in leo satellite edge computing for IoT: a gametheoretical approach. Chin J Electron. https://doi.org/10.23919/cje.2022.00.412
Cisco (2022) Cisco Annual Internet Report (20182023). https://www.cisco.com/c/en/us/solutions/collateral/executiveperspectives/annualinternetreport/whitepaperc11741490.html. Accessed 10 Mar 2020
Llorca J, Tulino AM, Guan K, Esteban J, Varvello M, Choi N et al (2013) Dynamic in network caching for energy efficient content delivery. In: 2013 Proceedings IEEE INFOCOM. IEEE, Turin, pp 245â€“249
Huang J, Gao H, Wan S et al (2023) AoIaware energy control and computation offloading for industrial IoT. Futur Gener Comput Syst 139:29â€“37
Chen Y, Zhao J, Hu J, etÂ al (2023a) Distributed task offloading and resource purchasing in NOMAenabled mobile edge computing: Hierarchical game theoretical approaches. ACM Trans Embed Comput Syst. https://doi.org/10.1145/3597023
Chen Y, Zhao J, Zhou X, etÂ al (2023b) A distributed game theoretical approach for credibilityguaranteed multimedia data offloading in MEC. Inf Sci. https://doi.org/10.1016/j.ins.2023.119306
Pan J, McElhannon J (2017) Future edge cloud and edge computing for internet of things applications. IEEE Internet Things J 5(1):439â€“449
Chen Y, Zhao J, Wu Y et al (2022) QoEaware decentralized task offloading and resource allocation for endedgecloud systems: a gametheoretical approach. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2022.3223119
Huang J, Wan J, Lv B, Ye Q et al (2023) Joint computation offloading and resource allocation for edgecloud collaboration in internet of vehicles via deep reinforcement learning. IEEE Syst J 17(2):2500â€“2511
Zhong C, Gursoy MC, Velipasalar S (2019) Deep multiagent reinforcement learning based cooperative edge caching in wireless networks. In: ICC 20192019 IEEE International Conference on Communications (ICC). IEEE,Â Shanghai, pp 1â€“6
Chen Y, Gu W, Xu J, etÂ al (2022) Dynamic task offloading for digital twinempowered mobile edge computing via deep reinforcement learning. China Commun. https://doi.org/10.23919/JCC.ea.20220372.202302
Fan Q, Li X, Li J, He Q, Wang K, Wen J (2021) PACache: evolving learningbased popularityaware content caching in edge networks. IEEE Trans Netw Serv Manag 18(2):1746â€“1757
Song J, Song Q, Wang Y, Lin P (2021) Energydelay tradeoff in adaptive cooperative caching for energyharvesting ultradense networks. IEEE Trans Comput Soc Syst 9(1):218â€“229
Zhang S, Sun W, Liu J (2019) Spatially cooperative caching and optimization for heterogeneous network. IEEE Trans Veh Technol 68(11):11260â€“11270
Ren H, Xu Z, Liang W, Xia Q, Zhou P, Rana OF, Galis A, Wu G (2020) Efficient algorithms for delayaware NFVenabled multicasting in mobile edge clouds with resource sharing. IEEE Trans Parallel Distrib Syst 31(9):2050â€“2066
Araniti G, Rinaldi F, Scopelliti P, Molinaro A, Iera A (2019) A dynamic MBSFN area formation algorithm for multicast service delivery in 5G NR networks. IEEE Trans Wirel Commun 19(2):808â€“821
Condoluci M, Araniti G, Molinaro A, Iera A (2015) Multicast resource allocation enhanced by channel state feedbacks for multiple scalable video coding streams in lte networks. IEEE Trans Veh Technol 65(5):2907â€“2921
Guo C, Cui Y, Ng DWK, Liu Z (2018) Multiquality multicast beamforming with scalable video coding. IEEE Trans Commun 66(11):5662â€“5677
Wu F, Yang W, Ren J, Lyu F, Ding X, Zhang Y (2020) Adaptive video streaming using dynamic ndn multicast in wlan. In: IEEE INFOCOM 2020IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE,Â Toronto, pp 97â€“102
Chen S, Yao Z, Jiang X, Yang J, Hanzo L (2020) Multiagent deep reinforcement learningbased cooperative edge caching for ultradense nextgeneration networks. IEEE Trans Commun 69(4):2441â€“2456
Poularakis K, Iosifidis G, Sourlas V, Tassiulas L (2016) Exploiting caching and multicast for 5G wireless networks. IEEE Trans Wirel Commun 15(4):2995â€“3007
Sun Y, Chen Z, Tao M, Liu H (2020) Bandwidth gain from mobile edge computing and caching in wireless multicast systems. IEEE Trans Wirel Commun 19(6):3992â€“4007
Jiang D, Cui Y (2019) Analysis and optimization of caching and multicasting for multiquality videos in largescale wireless networks. IEEE Trans Commun 67(7):4913â€“4927
Huang W, Huang Y, He S, Yang L (2020) Cloud and edge multicast beamforming for cacheenabled ultradense networks. IEEE Trans Veh Technol 69(3):3481â€“3485
Dani MN, So DK, Tang J, Ding Z (2021) NOMA and coded multicasting in cacheaided wireless networks. IEEE Trans Wirel Commun 21(4):2506â€“2520
Bilal K, Shuja J, Erbad A, Alasmary W, Alanazi E, Alourani A (2022) Addressing challenges of distance learning in the pandemic with edge intelligence enabled multicast and caching solution. Sensors 22(3):1092
Hassanzadeh P, Tulino A, Llorca J, Erkip E (2016) Cacheaided coded multicast for correlated sources. In: 2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC). IEEE,Â Brest, pp 360â€“364
Li C, Toni L, Zou J, Xiong H, Frossard P (2017) Qoedriven mobile edge caching placement for adaptive video streaming. IEEE Trans Multimedia 20(4):965â€“984
Tran TX, Pompili D (2018) Adaptive bitrate video caching and processing in mobileedge computing networks. IEEE Trans Mob Comput 18(9):1965â€“1978
Wang F, Wang F, Liu J, Shea R, Sun L (2020) Intelligent video caching at network edge: a multiagent deep reinforcement learning approach. In: IEEE INFOCOM 2020IEEE Conference on Computer Communications. IEEE,Â Toronto, pp 2499â€“2508
Singal G, Laxmi V, Gaur MS, Rao DV, Kushwaha R, Garg D, Kumar N (2021) Qosaware meshbased multicast routing protocols in edge ad hoc networks: Concepts and challenges. ACM Trans Internet Technol (TOIT) 22(1):1â€“27
Damera B, Babu PC (2014) Broadcast the user chosen content over LTE in a simplified approach using evolved multimedia broadcast multicast service (eMBMS) with optimized MCE scheduling algorithm. In: 2014 International Conference on Intelligent Computing Applications. IEEE, pp 90â€“94
Zahoor K, Bilal K, Erbad A, Mohamed A (2020) Serviceless video multicast in 5G: Enablers and challenges. IEEE Netw 34(3):270â€“276
Qin Y, Xia Q, Xu Z, Zhou P, Galis A, Rana OF, Ren J, Wu G (2020) Enabling multicast slices in edge networks. IEEE Internet Things J 7(9):8485â€“8501
MaddahAli MA, Niesen U (2014) Fundamental limits of caching. IEEE Trans Inf Theory 60(5):2856â€“2867
Liao J, Wong KK, Zhang Y, Zheng Z, Yang K (2017) Coding, multicast, and cooperation for cacheenabled heterogeneous small cell networks. IEEE Trans Wirel Commun 16(10):6838â€“6853
He S, Ren J, Wang J, Huang Y, Zhang Y, Zhuang W, Shen S (2019) Cloudedge coordinated processing: Lowlatency multicasting transmission. IEEE J Sel Areas Commun 37(5):1144â€“1158
Somuyiwa SO, GyÃ¶rgy A, GÃ¼ndÃ¼z D (2019) Multicastaware proactive caching in wireless networks with deep reinforcement learning. In: 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE,Â Cannes, pp 1â€“5
Hartigan JA, Wong MA (1979) Algorithm as 136: A kmeans clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100â€“108
Lee D, Choi J, Kim JH, Noh SH, Min SL, Cho Y et al (1999) On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies. In: Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. ACM, Atlanta, pp 134â€“143
Bramson M (1994) Instability of FIFO queueing networks. Ann Appl Probab 4(2):414â€“431
Vietri G, Rodriguez LV, Martinez WA, Lyons S, Liu J, Rangaswami R et al (2018) Driving cache replacement with MLbased LeCaR. In: 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 18).Â USENIX, Boston, pp 928â€“936
Megiddo N, Modha DS (2003) ARC: A SelfTuning, low overhead replacement cache. In: 2nd USENIX Conference on File and Storage Technologies (FAST 03), volÂ 3.Â USENIX, San Francisco, pp 115â€“130
Ban TW, Lee W, Ryu J (2020) An efficient coded streaming using clientsâ€™ cache. Sensors 20(21):6220
Acknowledgements
The authors would like to thank to anonymous reviewers for their valuable comments on the manuscript.
Funding
This work is supported in part by the National NSFC (Grant No. 62102053), the Natural Science Foundation of Chongqing, China (Grant No. CSTB2022NSCQMSX1104), the General Program of Chongqing Science & Technology Commission (Grants No. CSTB2022TIADGPX0017 and CSTB2022TIADSTX0006), the Science and Technology Plan Project of Chongqing Economic and Information Commission (Grant No. 2211R49R03), the Key Research Program of Chongqing Science & Technology Commission (Grant No. cstc2021jscxdxwtBX0019), the Regional Innovation Cooperation Project of Sichuan Province (Grant No. 2023YFQ0028), the Regional Science and Technology Innovation Cooperation Project of Chengdu City (Grant No. 2023YF1100023HZ), and the EU Horizon 2020 research and innovation program under the Marie SkÅ‚odowskaCurie grant agreement No. 898588, and the Chongqing Key Laboratory of Digital Cinema Art Theory and Technology (Grant No. 2021KF01). This article reflects only the authorsâ€™ view. The European Union Commission is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Contributions
Ruohan Shi carried out the experiment and wrote the manuscript with support from Qilin Fan, Shu Fu, Xu Zhang, Xiuhua Li and Meng Chen. Qilin Fan supervised the whole project. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shi, R., Fan, Q., Fu, S. et al. COCAM: a cooperative video edge caching and multicasting approach based on multiagent deep reinforcement learning in multiclouds environment. J Cloud Comp 12, 123 (2023). https://doi.org/10.1186/s1367702300510x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1367702300510x
Keywords
 Multiclouds
 Edge caching
 Multicasting
 Deep reinforcement learning