Skip to main content

Advances, Systems and Applications

Stateless Q-learning algorithm for service caching in resource constrained edge environment

Abstract

In resource constrained edge environment, multiple service providers can compete to rent the limited resources to cache their service instances on edge servers close to end users, thereby significantly reducing the service delay and improving quality of service (QoS). However, service providers renting the resources of different edge servers to deploy their service instances can incur different resource usage costs and service delay. To make full use of the limited resources of the edge servers to further reduce resource usage costs, multiple service providers on an edge server can form a coalition and share the limited resource of an edge server. In this paper, we investigate the service caching problem of multiple service providers in resource constrained edge environment, and propose an independent learners-based services caching scheme (ILSCS) which adopts a stateless Q-learning to learn an optimal service caching scheme. To verify the effectiveness of ILSCS scheme, we implement COALITION, RANDOM, MDU, and MCS four baseline algorithms, and compare the total collaboration cost and service latency of ILSCS scheme with these of these four baseline algorithms under different experimental parameter settings. The extensive experimental results show that the ILSCS scheme can achieve lower total collaboration cost and service latency.

Introduction

With the explosive growth of smart end devices, various latency-sensitive network services provided by different service providers [1], such as virtual reality (VR), real-time navigation, and interactive online games [2], have emerged, which bring great convenience to people’s lives. Traditionally, the service instances corresponding to these latency-sensitive services are deployed on the remote cloud datacenters. When a large number of end users frequently access these service instances, it will pose a long service latency and a huge traffic burden on the core networks [3, 4]. To address this problem, edge computing as a new computing paradigm, which sinks the computation, bandwidth and storage resources from remote cloud to the edge servers close to end users, provide a promising solution. In edge computing environment, service providers can rent Virtual Machines (VMs) encapsulating the computation and bandwidth resources of edge servers to deploy their service instances, thereby greatly reducing the service latency and improve the quality of service. However, the resources of edge servers are limited, and service providers renting the limited resource of different edge servers to cache service instances can incur different resource usage costs and service latency [5]. To reduce service latency and make full use of the limited resources to further reduce resource usage costs, multiple service providers can share leased VMs with other service providers.

There are some existing studies on service caching problem in edge environment [6,7,8,9]. In particular, Xia et al. [6] formulate the edge data caching problem into a constrained optimization problem and then adopt an integer programming and an approximation algorithm to solve this problem. Its main objective is to minimize the data caching cost and maximize the reduction in service latency. However, this work only considers collaborative service caching between adjacent edge servers in static scenarios. To address these problems, Xia et al. [7] propose a Lyapunov optimization based online algorithm to solve the dynamic collaborative edge data caching problem, aiming at minimizing the overall system cost. However, this work mainly focuses on service cost optimization without considering service latency reduction. In order optimize the long-term utility defined as the weighted sum of the service cost and the service latency reduction, Huang et al. [8] propose a utility-aware collaborative service caching scheme to coordinate multiple edge servers to cache service instances. However, all of the above studies mainly focus that in resource constrained edge environment, edge servers cooperate with each other to quickly retrieve the required service instances. They don’t consider that multiple service providers can share leased VMs with other service providers to make full use of the limited resource of edge servers and reduce the collaboration cost.

In this paper, we investigate the service caching problem with resource sharing among multiple service providers in resource constrained edge environment. To address this problem, we construct the resource sharing model by multiple service providers, cost model for service provider and service latency model, respectively. Based these models, we further formulate the service caching problem. In order to solve this problem, we propose an independent learners-based service caching scheme (ILSCS) to minimize the collaboration cost and the service latency. The ILSCS scheme adopt a stateless Q-learning algorithm, in which each edge server is treated as an agent, the caching decision of each service instance as a base action, and the inverse of the collaboration costs, which is a function of the service latency and the usage cost of shared resource, as the immediate reward, to learn an optimal service caching policy. In order to verify the effectiveness of the ILSCS scheme, we implement COALITION, RANDOM, MDU and MCS four baseline algorithms. We compare the total collaboration cost and the service latency of ILSCS scheme with these of these four baseline algorithms under different environmental parameters such as service size, number of services, number of edge servers, and storage capacity of edge servers. The related experimental results demonstrate that the ILSCS scheme can achieve lower total collaboration cost and service latency. Our main contributions can be summarized as follows:

  1. (1)

    We formulate the service caching problem with resource sharing among edge service providers in resource constraint edge environment.

  2. (2)

    We propose an independent learner-based service caching scheme to minimize the total collaboration cost and the service latency. The ILSCS scheme adopts a stateless Q-learning algorithm to learn an optimal service caching scheme.

  3. (3)

    We implement four baseline algorithms and conduct extensive experiments to compare the total collaboration cost and the service latency with these of four baseline algorithms. The related experimental results demonstrate that the ILSCS schem can reduce the collaborative cost and the service latency.

We organize the remainder of this paper as follows. We summarize the state-of-the-arts on this topic in Related works. We formulate the service caching problem of multiple service providers in resource constraint edge environment in System model and problem formulation. We describe the proposed ILSCS scheme in The independent learners-based service caching scheme. We conduct extensive experiments and analyse the related experimental result in Experimental evaluation. Finally, we conclude this paper in Conclusions and future work.

Related works

The service caching problem in resource constrained edge environment is a very popular research topic. There are a large number of related studies on this service caching problem [10,11,12,13,14,15,16,17,18,19,20]. According to whether edge servers cooperate with each other, these related studies can be classified two types: service caching without cooperation and service caching with cooperation.

For service caching without cooperation, various approaches including popularity prediction, heuristic approach and etc., are adopted to make service caching decision [10,11,12,13]. For example, Du et al. [10] adopted a reduced support vector regression (rSVR) model to predict the popularity of cached files to improve the hit rate of cached files. Compared to the original SVR model, the rSVR model learns only on a smaller reserved subset and requires less storage space. Rim et al. [11] suggested to update caching content based on individual users' short-term content preferences and proposed a content caching strategy based on joint mobility prediction and user prefetching (MPJUP). This strategy reduces the average latency and backhaul load of data fetching by predicting the user's location and the required data. Qi et al. [12] designed a neural network to predict the popularity of content, and based on which a heuristic approach is adopted to optimize active and responsive hybrid caching policies. Its main goal is to improve the overall successful offloading ratio of the mobile edge network. Wang et al. [13] modeled the caching problem as a Markov decision process and propsosed a distributed cache replacement strategy base on Q-learning to minimize the transmission cost. However, this paper mainly considers to optimize the traffic and does not focus the service cost and the service latency. In additon, this paper does not consider multiple service providers to share the limited resources of edge servers.

For service caching with cooperation, some related studies design various cooperation schemes to coordinate multiple edge servers to cache service instances [14,15,16,17,18,19,20]. For example, Ahani et al. [14] proposed an optimal content caching scheme in a time-slot system with delivery deadline and cache capacity constraints, the objective of which is to minimize the cost of the backhaul link load. Kim et al. [15] proposed a distributed edge caching scheme to reduce the content delivery delay in edge network with limited storage, content popularity, content placement and access capacity. Gu et al. [16] formulate a cooperative edge caching problem to be a non-cooperative game model and proposed a cooperative edge caching framework, aiming to reduce data transfer latency, relieve data traffic on the backbone network and reduce the workload of cloud servers. Kim et al. [17] proposed a cooperative edge caching approach based on deep reinforcement learning to promote cooperation among edge servers and improve the hit ratio of the system. Ren et al. [18] proposed a cooperative caching scheme based on game theory to make caching decision. Its main goal is to minimize the average latency of acquiring content. However, all of the above studies mainly consider the service caching problem with service providers exclusive resources. They don’t consider multiple service providers to share leased VMs with other service providers. Its main goal of which is to minimize the resource usage cost of all service providers. Song et al. [19] proposed a distributed algorithm based on alternating direction method of multipliers to jointly optimize the content caching in cooperative base stations, aiming at reducing cost of content retrieving. This paper does not focus the service latency. Lu et al. [20] formulated the service placement problem as a mixed-integer linear programming problem. To address this problem, this paper proposed a deep reinforcement learning (DSP-DRL) based decentralized dynamic placement framework to minimize the latency. However, this paper does not consider the cost. In addition, all of these above studies do not consider multiple service providers to share the limited resources of edge servers.

System model and problem formulation

In this section, we first introduce the system model. Then we present the resource sharing model by multiple service providers, cost model for service provider, and utility model for service caching in resource constrained edge environment, respectively. Finally, we formulate the service caching problem of multiple service providers in resource constrained edge environment. Each service provider is allowed to share its VM with others when the VM is idle. The key notation used throughout this paper are listed in Table 1.

Table 1 Key notation

System model

As shown in Fig. 1, we mainly consider an edge environment consisting of \(n\) edge servers \(eNB=\{eN{B}_{1},\dots ,eN{B}_{i},\dots ,eN{B}_{n}\}\) and a central cloud \(CL\). These edge servers are deployed near the end users. Each edge server \(eN{B}_{i}\) can be represented by a three-tuple \(eN{B}_{i}=<{C}_{i},{B}_{i},{S}_{i}>\), in which \({C}_{i}\), \({B}_{i}\) and \({S}_{i}\) denote the computational capacity, bandwidth capacity, and storage capacity of the edge server \(eN{B}_{i}\), respectively. These resources of edge server \(eN{B}_{i}\) can be encapsulated to be \(m\) VMs. The set of \(m\) VMs can be denoted by \(V{M}_{i}=\{V{M}_{i,1},\dots ,V{M}_{i,j},\dots ,V{M}_{i,m}\}\), in which \(V{M}_{i,j}\) denotes the \(j\) th VM in edge server \(eN{B}_{i}\). Each VM \(V{M}_{i,j}\) can be denoted by a two-tuple \(V{M}_{i,j}=<{C}_{i,j},{B}_{i,j}>\), in which \({C}_{i,j}\) denotes the computational capacity of the VM \(V{M}_{i,j}\), and \({B}_{i,j}\) denotes the bandwidth capacity of the VM \(V{M}_{i,j}\). The central cloud \(CL\) hosts a set of original service instances that are to be cached to the VMs of edge servers. Due to the limited resources of edge servers, multiple service providers may compete to rent the limited resources to deploy their service instances [21].

Fig. 1
figure 1

An example of service caching in edge environment

Resource sharing model by multiple service providers

In our edge environment, there are \(K\) service providers \(SP=\{S{P}_{1},\dots ,S{P}_{\mathrm{k}},\dots ,S{P}_{K}\}\) and each service provider has a service instance. The set of these service instances can be denoted by \(SE=\{S{E}_{1},\dots ,S{E}_{k},\dots ,S{E}_{K}\}\). The service instance \(S{E}_{k}\) of the \(k\) th service provider \(S{P}_{\mathrm{k}}\) can be denoted by a two-tuple \(S{E}_{k}=<{W}_{k},{D}_{k}>\), in which \({W}_{k}\) denotes the workload of the computation request processed by corresponding service instances \(S{E}_{k}\), \({D}_{k}\) denotes the size of the input data required by the computation request. Each service instance \(S{E}_{k}\) have a set of user requests to process. If the \(k\) th service provider \(S{P}_{\mathrm{k}}\) caches its service instance \(S{E}_{\mathrm{k}}\) on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\), the user requests will be redirected to the edge server \(eN{B}_{i}\) to process. Otherwise, the user requests will be fulfilled by the original service instance in the central cloud \(CL\). Each service provider provides services with relatively stable performance and has a stable users base. The users of a service provider will not move to other service providers in the short term. To improve the quality of service (QoS) and keep the user base, service providers cache their service instances to edge servers nearby end users [22]. However, caching service instances on the edge servers greatly increases the service cost of service providers. To reduce service cost, different service providers can cache their service instances to different VMs on the same edge server for resource sharing. Moreover, when the VM occupied by service provider is idle, it can also be shared with other service providers, and thereby greatly reducing the service cost of service providers [23].

All service providers on the same edge server are referred as a coalition. The coalition on the edge server \(eN{B}_{i}\) can be denote by \({g}_{i}\). Since the storage resource of each edge server is limited, the sum of the size of input data required by the computation requests corresponding to the service instances in the coalition cannot exceed the storage capacity of the edge server, i.e., \(\sum_{S{P}_{\mathrm{k}}\in {g}_{i}}{D}_{\mathrm{k}}\le {S}_{i}\). Each service provider can apply to join in a coalition. Each coalition has an agent which decides whether service provider’s applying is accepted or not. When it is accepted, the agent further assigns the service instance to an optimal VM according to the resources of different VMs.

Cost model for service provider

To cache service providers’ service instances on edge servers incurs additional service costs [24]. When a VM only cache a service instance, the cost to rent this VM is undertaken by the corresponding service provider. When a VM is shared by multiple service instances, the cost will be shared among the cached instances [25]. We model the cost of a service provider occupying a VM alone and the cost of a service provider in a coalition sharing a VM, respectively.

When a service instance \(S{E}_{k}\) is cached on a VM \(V{M}_{i,j}\), the computation and bandwidth resources of the VM \(V{M}_{i,j}\) are exclusive to the service instance \(S{E}_{k}\). The cost of using per unit of computation resource of the edge server \(eN{B}_{i}\) can be denoted by \({c}_{i}^{p}\). The cost of using per unit of bandwidth resource of the edge server \(eN{B}_{i}\) can be denoted by \({b}_{i}^{p}\). Therefore, the exclusive resource usage cost incurred by the service provider \(S{P}_{k}\) for exclusive ownership of the VM \(V{M}_{i,j}\) can be calculated by Eq. (1):

$${c}_{i,j}={c}_{i}^{p}\cdot {C}_{i,j}+{b}_{i}^{p}\cdot {B}_{i,j}$$
(1)

It is note that the usage cost of exclusive resource is referred as its default cost.

When multiple service providers on an edge server form a coalition, the computation and bandwidth resources of the edge server can be shared by these service providers. Since each service provider is self-interested, we adopt the cost policy proposed in the literature [26] to ensure the stability of the formed coalition. When service provider \(S{P}_{k}\) joins in the coalition \({g}_{i}\) and caches its service instance \(S{E}_{k}\) on \(V{M}_{i,j}\) in coalition \({g}_{i}\), the service provider \(S{P}_{k}\) shares the computation and bandwidth resources of edge server \(eN{B}_{i}\) with other service providers in coalition \({g}_{i}\) and the usage cost \({p}_{k,j}\left({g}_{i}\right)\) of shared resource can be calculated by Eq. (2):

$${p}_{k,j}\left({g}_{i}\right)= \frac{{c}_{i,j}}{\sum_{S{P}_{k^{\prime}}\in {g}_{i}}{c}_{i,j^{\prime}}}c({g}_{i})$$
(2)

where \(\sum_{S{P}_{k^{\prime}}\in {g}_{i}}{c}_{i,j^{\prime}}\) denotes the sum of the default cost of the service providers in coalition \({g}_{i}\). \(c\left({g}_{i}\right)={c}_{i}^{p}\cdot {C}_{i}+{b}_{i}^{p}\cdot {B}_{i}\) denotes the cost of the edge server \(eN{B}_{i}\).

Service latency model

The service latency is defined to be the sum of service execution time and data transfer time. The service latency of the computation request processed by the service instance cached in edge server is very different from that by service instance cached the service in the central cloud \(CL\). [27]. The service latency of the computation request processed by the service instance \(S{E}_{\mathrm{k}}\) in the central cloud \(CL\) can be denoted by \({d}_{k}^{CL}\). The service latency of the computation request processed by the service instance \(S{E}_{\mathrm{k}}\) cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) can be denoted by \({d}_{i,j,\mathrm{k}}\). The service latency is composed of service execution delay and data transfer time. The service execution latency of the computation requests processed by service instance \(S{E}_{\mathrm{k}}\) cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) can be calculated by \({d}_{i,j,k}^{exe}={W}_{\mathrm{k}}/{C}_{i,j}\), where \({C}_{i,j}\) denotes the computing capacity of \(V{M}_{i,j}\). The transfer time of input data required by service instance \(S{E}_{\mathrm{k}}\) can be denoted by \({d}_{i,k}^{tran}\). Therefore, the service latency \({d}_{i,j,\mathrm{k}}\) can be denoted by calculated by Eq. (3).

$${d}_{i,j,\mathrm{k}}={d}_{i,k}^{tran}+{d}_{i,j,k}^{exe}$$
(3)

Since the edge server is closer to the end user than the central cloud, the service latency of the computation request processed by the service instance cached in the edge servers is usually much smaller than that by service instance cached in the central cloud, expressed as \({d}_{i,j,\mathrm{k}}\ll {d}_{k}^{CL}\).

Utility model

To minimize the service cost and the service latency, the utility function can be defined as the weighted sum of the service cost and service latency. The utility obtained by the service instance \(S{E}_{\mathrm{k}}\) occupying the VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) alone is defined as the default utility, which can be denoted by \({u}_{i,j,k}^{dft}\). The utility obtained by the service instance \(S{E}_{\mathrm{k}}\) sharing the resources of edge server \(eN{B}_{i}\) with other service providers in coalition \({g}_{i}\) is defined as the collaboration utility, which can be denoted by \({u}_{i,j,k}^{coll}\). Different service instances have different delay sensitivities. The importance of the service instance \(S{E}_{\mathrm{k}}\) can be adjusted by the weighted \({v}_{\mathrm{k}}\). Therefore, the default utility and the collaboration utility can be calculated by Eq. (4) and Eq. (5), respectively.

$${u}_{i,j,k}^{dft}={v}_{\mathrm{k}}\cdot \left({d}_{k}^{CL}-{d}_{i,j,\mathrm{k}}\right)-{c}_{i,j}$$
(4)
$${u}_{i,j,k}^{coll}={v}_{\mathrm{k}}\cdot \left({d}_{k}^{CL}-{d}_{i,j,\mathrm{k}}\right)-{p}_{k,j}\left({g}_{i}\right)$$
(5)

A service provider can cache its service instance to a VM \(V{M}_{i,j}\) to maximize its default utility. To further reduce the service cost and obtain greater utility, the service provider can share the resources of VM \(V{M}_{i,j}\) with other service providers in coalition \({g}_{i}\). The additional utility obtained by service provider \(S{P}_{\mathrm{k}}\) joining in the coalition \({g}_{i}\) can be denoted by \({u}_{i,j,k}\), which can be calculated by Eq. (6):

$${u}_{i,j,k}={u}_{i,j,k}^{coll}-{u}_{i^{\prime},j^{\prime},k}^{dft}={c}_{i^{\prime},j^{\prime}}-({p}_{k,j}\left({g}_{i}\right)-{v}_{\mathrm{k}}\cdot {d}_{i^{\prime},j^{\prime},\mathrm{k}}+{v}_{\mathrm{k}}\cdot {d}_{i,j,\mathrm{k}})$$
(6)

where \({u}_{i^{\prime},j^{\prime},k}^{dft}\) is the maximum default utility that can be obtained by the service provider \(S{P}_{\mathrm{k}}\). Each service provider with occupying the VM \(V{M}_{i^{\prime},j^{\prime}}\) of edge server \(eN{B}_{i^{\prime}}\) alone has a default service cost \({c}_{i^{\prime},j^{\prime}}\). \({p}_{k,j}\left({g}_{i}\right)-{v}_{\mathrm{k}}\cdot {d}_{i^{\prime},j^{\prime},\mathrm{k}}+{v}_{\mathrm{k}}\cdot {d}_{i,j,\mathrm{k}}\) can reflect the collaboration cost, which can be denoted by \({c}_{i,j,k}^{coll}\).

Problem formulation

Different edge servers have different computing resources and bandwidth resources [28]. Therefore, the service latency of the computation request processed by different edge servers vary greatly. Moreover, a service provider choosing different edge servers to form a coalition also greatly affects its service cost. Therefore, with limited computing and bandwidth resources of edge servers, the main goal of service caching is to minimize the sum of the collaboration costs of all service providers [29]. Here, we formulate the service caching problem as follows:

$$\mathrm{Minimize}: \sum\nolimits_{i=1}^{n}\sum\nolimits_{j=1}^{m}\sum\nolimits_{k=1}^{K}{a}_{i,j,k}\cdot {c}_{i,j,k}^{coll}$$
(7)
$$\mathrm{Subject\,to}: \sum\nolimits_{S{P}_{\mathrm{k}}\in {g}_{i}}{D}_{\mathrm{k}}\le {S}_{i}$$
(8)
$$\sum\nolimits_{i=1}^{n}\sum\nolimits_{j=1}^{m}\sum\nolimits_{k=1}^{K}{a}_{i,j,k}=1$$
(9)
$${a}_{i,j,k}\in \{\mathrm{0,1}\}$$
(10)

where \({a}_{i,j,k}\) denotes whether the service instance \(S{E}_{\mathrm{k}}\) of service provider \(S{P}_{\mathrm{k}}\) is cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\). If \({a}_{i,j,k}=1\), it denotes that the service instance \(S{E}_{k}\) is cached in \(V{M}_{i,j}\) of the edge server \(eN{B}_{i}\). Otherwise, it means that the service instance \(S{E}_{k}\) is not cached in \(V{M}_{i,j}\) of the edge server \(eN{B}_{i}\). Equation (8) ensures that the sum of the size of the input data required by the computation requests processed by corresponding service instances in the edge server does not exceed the maximum storage capacity \({S}_{i}\) of the edge server \(eN{B}_{i}\). Equation (9) denotes that the service instance \(S{E}_{\mathrm{k}}\) of service provider \(S{P}_{\mathrm{k}}\) is only cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\). Service provider \(S{P}_{\mathrm{k}}\) no longer need to cache its service instance \(S{E}_{\mathrm{k}}\) on other edge servers, which increases the resource cost.

The independent learners-based service caching scheme

To solve the service caching problem in a resource constrained edge environment, we adopt a stateless Q-learning algorithm, and design an independent learners-based service caching scheme. In this section, we first introduce the stateless Q-learning algorithm. Then, we define the action space and reward function of the service caching problem. Finally, we describe the independent learners-based service caching scheme in detail.

Stateless Q-learning algorithm

Q-learning algorithm is a simple and easy to understand reinforcement learning algorithm [30]. The Q value is the expected reward obtained after taking a specific action at a specific state. The Q value can be used to measure the effectiveness of the actions. The Q-learning algorithm can learn an estimated Q-value obtained by taking each action at each state. However, the environment state is sometimes only releated to actions and not to states [31]. Therefore, the Q-learning algorithm can be further simplified to a stateless Q-learning algorithm [32]. In our problem model, the environment state is manly related to the caching action of service instances. Therefore, a stateless Q-learning algorithm is adopted to solve the service caching problem. Each edge server \(eN{B}_{i}\) is treated as an agent \(i\), its caching decision as an action \({a}_{i}\), the inverse of the sum of the collaboration costs of all service instances cached in edge server \(eN{B}_{i}\) as the immediate reward \({R}_{i}\). The expected reward obtained by each agent \(i\) performing action \({a}_{i}\) in next time period can be denoted \({Q}_{i}({a}_{i})\). For the stateless Q-learning algorithm, the expected reward \({Q}_{i}({a}_{i})\) can be updated by Eq. (11) [32]:

$${Q}_{i}\left({a}_{i}\right)\leftarrow {Q}_{i}\left({a}_{i}\right)+{\lambda }_{T}\cdot ({R}_{i}\left(\tau \right)-{Q}_{i}\left({a}_{i}\right))$$
(11)

where \({\lambda }_{T}\) denotes the learning rate.

Action space and reward function

The caching action \(a\left(\tau \right)\) at the time step \(\tau\) can be defined by the Eq. (12):

$$a\left(\tau \right)=({a}_{1}\left(\tau \right),\dots ,{a}_{i}\left(\tau \right),\dots ,{a}_{n}\left(\tau \right))$$
(12)

where \({a}_{i}\left(\tau \right)=({a}_{i,\mathrm{1,1}}\left(\tau \right),\dots ,{a}_{i,j,k}\left(\tau \right),{a}_{i,m,K}\left(\tau \right))\) is a vector that indicates whether the \(K\) service instances are cached on the edge server \(eN{B}_{i}\).\({a}_{i,j,k}\left(\tau \right)\) indicates whether service instance \(S{E}_{\mathrm{k}}\) is cached on \(V{M}_{i,j}\) at the time step \(\tau\). If \({a}_{i,j,k}\left(\tau \right)=1\), the service instance \(S{E}_{\mathrm{k}}\) is cached on \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) at the time step \(\tau\). Otherwise, \({a}_{i,j,k}\left(\tau \right)=0\) indicates that service instance \(S{E}_{\mathrm{k}}\) is not cached on \(V{M}_{i,j}\) of the edge server \(eN{B}_{i}\) at the time step \(\tau\).

The actions \(a\left(\tau \right)\), \({a}_{i}\left(\tau \right)\), and \({a}_{i,j,k}\left(\tau \right)\) are called super action, joint action and base action, respectively. Their action spaces’ sizes are \(n\cdot {2}^{MK}\), \({2}^{MK}\) and 2, respectively. Sine the action spaces’ sizes of the super action and joint action are exponential, the Q-learning algorithm needs an exponential number of iterations to go through all actions and learn their Q values, which is clearly infeasible. To address this problem, refer to the literature [32], the Q-value \({Q}_{i,j,k}\left({a}_{i,j,k}\right)\) of each base action \({a}_{i,j,k}\) is first learned. Then, according to the Q-values of all base action, the Q-value of the super action can be obtained. Therefore, the action space of super action can be greatly reduced to that of base action. After the agent \(i\) of edge server \(eN{B}_{i}\) performing the base action \({a}_{i,j,k}\left(\tau \right)\), the expected reward \({Q}_{i,j,k}\left({a}_{i,j,k}\right)\) can be obtained. Based on the Q-value \({Q}_{i,j,k}\left({a}_{i,j,k}\right)\) of basic action \({a}_{i,j,k}\), the Q-value \({Q}_{i}\left({a}_{i}\right)\) of the super action \({a}_{i}\) can be further calculated. The expected reward \({Q}_{i,j,k}\left({a}_{i,j,k}\right)\) of base action \({a}_{i,j,k}\left(\tau \right)\) can be updated by Eq. (13):

$${Q}_{i,j,k}\left({a}_{i,j,k}\right)\leftarrow {Q}_{i,j,k}\left({a}_{i,j,k}\right)+\frac{1}{{C}_{i,j,k}\left({a}_{i,j,k}\right)+1}\cdot ({R}_{i,j,k}\left(\tau \right)-{Q}_{i,j,k}\left({a}_{i,j,k}\right))$$
(13)

where \({C}_{i,j,k}\left({a}_{i,j,k}\right)\) denotes the number of times that service instances \(S{E}_{\mathrm{k}}\) is cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) at time step \(\tau\). \({R}_{i,j,k}\left(\tau \right)\) denotes the immediate reward obtained by caching the service instance \(S{E}_{\mathrm{k}}\) on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) at time step \(\tau\). Since the immediate reward \({R}_{i,j,k}\left(\tau \right)\) is the inverse of the cost of the service instance \(S{E}_{\mathrm{k}}\) cached on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\), it can be denoted by \({R}_{i,j,k}\left(\tau \right)=-{a}_{i,j,k}\cdot {c}_{i,j,k}^{coll}\). The estimated reward obtained by caching the service instance \(S{E}_{\mathrm{k}}\) on VM \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\) can be calculated by \({Q}_{i,j,k}={Q}_{i,j,k}\left(1\right)-{Q}_{i,j,k}\left(0\right)\). When \({a}_{i,j,k}=0\), \({R}_{i,j,k}\left(\tau \right)=0\). Based on the values of \({a}_{i,j,k}\) and \({R}_{i,j,k}\left(\tau \right)\), we can further compute the estimated rewards of action \({a}_{i,j,k}=0\) and action \({a}_{i,j,k}=1\), and obtain \({Q}_{i,j,k}\left(0\right)=0\) and \({Q}_{i,j,k}={Q}_{i,j,k}\left(1\right)\).

Algorithm implementation

To solve this above problem, we propose an independent learners-based services caching scheme (ILSCS). The ILSCS scheme adopt a stateless Q-learning algorithm to learn an optimal service caching with resource sharing among multiple service providers. The detail process of ILSCS scheme can be presented in Algorithm 1. We first initialize all \({C}_{i,j,k}\) and \({Q}_{i,j,k}\) to be 0 (line 1). Then we calculate the immediate reward \({R}_{i,j,k}\left(\tau \right)\) obtained by taking base action \({a}_{i,j,k}\left(\tau \right)\) (line 4). According to the immediate reward \({R}_{i,j,k}\left(\tau \right)\), we further update the corresponding \({Q}_{i,j,k}\) and \({C}_{i,j,k}\) (line 5–6).Base on all base actions \({a}_{i,j,k}\left(\tau \right)\), we can further to find an optimal super action \({a}^{*}\left(\tau \right)\). Referring to the literature [32, 33], the problem to find the optimal super action \({a}^{*}\) can be converted to be a 0–1 backpack problem. It can be formulated as follows:

$$\mathrm{Maximize}:\sum\nolimits_{i=1}^{n}\sum\nolimits_{j=1}^{m}\sum\nolimits_{k=1}^{K}{a}_{i,j,k}\cdot {Q}_{i,j,k}$$
(14)

Subject to:(8)(9)(10).

The 0–1 knapsack problem is a classical NP-hard problem [34]. It is very difficult to find the optimal super action \({a}^{*}\). In this paper, we adopt a greedy algorithm to solve the 0–1 knapsack problem and find an approximate optimal solution. We first calculate \({Q}_{i,j,k}/{D}_{\mathrm{k}}\), where \(i=\mathrm{1,2},\dots ,n\),\(j=\mathrm{1,2},\dots ,m\),\(k=\mathrm{1,2},\dots ,k\). Then, we sort \({Q}_{i,j,k}/{D}_{\mathrm{k}}\) in non-increasing order (line 9). According to the order of \({Q}_{i,j,k}/{D}_{\mathrm{k}}\), we sequentially perform the corresponding service caching actions. Specifically, the service caching action \({a}_{i,j,k}\left(\tau \right)\) corresponding to \({Q}_{i,j,k}/{D}_{\mathrm{k}}\) is to cache the service instance \(S{E}_{\mathrm{k}}\) in the \(V{M}_{i,j}\) of edge server \(eN{B}_{i}\). For the service caching action corresponding to the last 10% of \({Q}_{i,j,k}/{D}_{\mathrm{k}}\), we adopt an epsilon-greedy algorithm to select a service caching action (line 14). It means that we choose the service caching action corresponding to the last 10% of \({Q}_{i,j,k}/{D}_{\mathrm{k}}\) with probability \(\varepsilon\) and randomly select a service caching action with probability \(1-\varepsilon\). Otherwise, we perform the service caching action \({a}_{i,j,k}\left(\tau \right)\) corresponding to \({Q}_{i,j,k}/{D}_{\mathrm{k}}\) until constraint conditions (8), (9) and (10) are not satisfied (line 15–16). Finally, in order to make service providers join in coalitions to reduce the service cost, the service providers that do not join in any coalition are required to cache their service instances on the edge servers that minimizes their collaboration cost.

figure a

Algorithm 1. Independent learners-based service caching scheme (ILSCS)

Experimental evaluation

In order to evaluate the effectiveness of our proposed ILSCS scheme, we conduct extensive experiments to compare ILSCS scheme against COALITION, RANDOM, MDU, and MCS four baseline algorithms under different experimental settings. In this section, we first present the experiment parameters setting. Then we analyze the related experimental results.

Experimental parameter settings

In this paper, the edge environment mainly consists of \(K\) service providers, \(n\) edge servers and a central cloud \(CL\). Each service provider has a service instance. We set the related experimental parameters referring to literatures [26]. These experimental parameters are described in detail as follows.

  1. (1)

    The parameter settings for system model: the number \(n\) of edge servers is 50 in default. The computation capacity \({C}_{i}\) of each edge server \(eN{B}_{i}\) varies within the range [8000, 16000] MHz. The bandwidth capacity \({B}_{i}\) of each edge server \(eN{B}_{i}\) varies within the range [100, 1000] Mbps. The storage capacity \({S}_{i}\) of each edge server \(eN{B}_{i}\) varies within [200–300] GB. The computation capacity \({C}_{i,j}\) of VM \(V{M}_{i,j}\) in edge server \(eN{B}_{i}\) varies within [4000–8000] MHz and its bandwidth capacity \({B}_{i,j}\) varies within [10–100] Mbps. The usage cost per unit compute resource of each edge server \(eN{B}_{i}\) is set to [$0.15, $0.22]. The usage cost per unit bandwidth resource of each edge server \(eN{B}_{i}\) varies within [$0.05, $0.12]. The transmission delay \({d}_{i,k}^{tran}\) between the end user and the edge server \(eN{B}_{i}\) where cache the service instance \(S{E}_{\mathrm{k}}\) required by the end user is set to 5-20 ms. The service latency of the computation request processed by the service instance cached in central cloud is set to [50, 100] ms.

  2. (2)

    The parameter settings for resource sharing model by multiple service providers: the number \(K\) of service providers is 80 by default. The size \({D}_{\mathrm{k}}\) of the service instance \(S{E}_{\mathrm{k}}\) provided by service provider \(S{P}_{\mathrm{k}}\) varies within [30, 50] GB. The workload \({W}_{k}\) of the computation request processed by corresponding service instances \(S{E}_{k}\) are set to [50, 100] MHz. The weighted \({v}_{\mathrm{k}}\) varies within [100, 150].

Experimental analysis

To verify the effectiveness of ILSCS scheme, we implement COALITION, RANDOM, MDU and MCS four baseline algorithms. We compare the performance of ILSCS scheme with that of four baseline algorithms under different experimental parameter settings, and analyze these experimental results.

  • ILSCS: This abbreviation standards for independent learners-based service caching scheme. This scheme treats each edge server as an agent, and adopts a stateless Q-learning to learn an optimal service caching policy.

  • COALITION [26]: It adopts a distributed and stable game-theoretic mechanism to solve the service caching problem with resource sharing among multiple service providers, aiming at minimizing the social cost of all service providers.

  • RANDOM: It randomly selects virtual machines to cache service instances of service providers.

  • MDU (max default utility): It caches the service instances of service providers to these virtual machines that maximum their default utility.

  • MCS (max coalition size): It caches the service instances of service providers to these edge servers with the most members in the coalition.

Convergence of ILSCS

Figure 2 shows the learning curve of the ILSCS scheme. We can observe from Fig. 2 that the total collaboration cost gradually decreases and tends to be stable with the increase of the learning time (i.e., the number of episodes). It indicates that the ILSCS scheme can learn an optimal service caching strategy that minimize the total collaboration cost of all service providers. In resource constrained edge environment, each edge server is treated as an agent. Each agent can learn a collaboration caching scheme, that is how multiple service providers on this edge server can share the limited resource of the edge server to cache their service instances, to greatly reduce the resource usage cost.

Fig. 2
figure 2

The learning curve of ILSCS scheme

Impact of the size of the input data required by computation request

To investigate the impact of the size of the input data required by computation request on the total collaboration cost and the average services delay, we vary the input data’s size from 30 to 70 GB with the increment of 10 GB. Figure 3 show the impact of different sizes of the input data required by computation request on the total collaboration cost and the average latency. We can observe from the Fig. 3 that the total collaboration cost and the average service latency of ILSCS scheme are lower than these of COALITION, MDU, MCS and RANDOM four algorithms when the size of the service instances gradually increase. That is because that the ILSCS scheme can learn an optimal service caching policy with resource sharing among multiple service provider in a coalition of an edge server, which greatly reducing the collaboration cost and the server latency. Moreover, we can observe from Fig. 3(a) that the total collaboration cost of ILSCS, COALITION, MDU and MCS four algorithms gradually increase with the increase of the input data size. The main reason is that when the input data size increases, the number of service instances cached on an edge server decreases, thereby the number of service providers in coalition of the edge server decreasing. The smaller the number of service providers in coalition of the edge server, the higher the collaboration cost that paid by the members in the coalition, and thereby leading to higher social cost.

Fig. 3
figure 3

The impact of the size of input data required by computation request

The impact of the number of service providers

To examine the impact of the number of service providers on the total collaboration cost and the service delay, we vary the number of service providers from 40, 60, 80, 100 to 120. Figure 4 plot the related experimental result. We can see from Fig. 4(a) that the total collaboration cost of ILSCS, COALITION, MDU, MCS and RANDOM five algorithms gradually increase with the increase of the number of service providers. The main reason for this phenomenon is that the total collaboration cost is the sum of the collaboration cost of all service providers. When the number of service providers increases, the sum of the collaboration cost of all service providers increases as well. Moreover, we can also see from Fig. 4 that the total collaboration cost and the average service latency of ILSCS scheme are lower than these of COALITION, MDU, MCS and RANDOM four baseline algorithms. That is because that the ILSCS scheme can learn an optimal service caching policy once the number of service providers is fixed. On the one hand, the optimal service caching policy can cache service instances on optimal edge servers, which achieve a lower average service latency. On the other hand, the optimal service caching policy enables multiple service provider to share the limited resources of edge servers to cache more service instances, and thereby incurring lower collaboration cost and the average service delay.

Fig. 4
figure 4

The impact of the number of service providers

The impact of the number of edge servers

Figure 5 illustrates the impact of the number of edge servers on the total collaboration cost and the average service latency. In Fig. 5, we can see that with the number of edge servers varying from 20, 35, 50, 65, to 80, the total collaboration cost and the average service latency of ILSCS scheme gradually decrease. This is due to that with the increase of the number of edge servers, there are more available edge servers that can be selected to cache service instances of service providers. The more available edge servers, the higher probability the service providers have to select more cost-effective edge servers to cache their service instances, and thereby reducing the collaborative cost, alleviating the resource contention and reducing the average service latency. The MCS algorithm selects these edge servers with the most members in the coalition, rather than the edge servers with the highest cost-effective, to cache service instances. Therefore, we can observe that the total collaboration cost and the service latency of MCS algorithm are not relative to the number of edge servers. The RANDOM algorithm randomly selects edge servers to cache their service instances. The more available edge servers, the more scattered the service instances will be cached, and thereby leading to higher collaboration cost. Therefore, the collaboration cost of the RANDOM algorithm gradually increases with the increase of the number of edge servers. In addition, we can observe that the collaboration cost and the service latency of ILSCS scheme are lower than these of COALITION, MDU, MCS and RANDOM four algorithms. The main reason is that the ILSCS scheme can selects edge servers with the highest cost-effective to cache service instances and shares the resources of edge servers among multiple service instances on the same edge server.

Fig. 5
figure 5

The impact of the number of edge servers

The impact of the storage capacities of edge servers

To investigate the impact of the storage capacity of edge server on the total collaboration cost and the service latency, we vary the storage capacities of edge server from 200 GB, 250 GB, 300 GB, 350 GB to 400 GB. Figure 6 plots the related experimental result. We can observe from the Fig. 6(a) that the total collaboration cost of MCS, COALITION and MRSCS algorithms decreases with the increase of the storage capacities of the edge servers. That is because that the larger the storage capacity of the edge server is, the more service providers can join in a coalition of an edge server, and thereby decreasing the collaboration cost of the service providers in the coalition. The total collaboration costs of RANDOM and MDU algorithms are not affected by the storage capacities of edge servers. In addition, the total collaborative cost of ILSCS and COALITION two algorithms decrease faster than that of RANDOM, MCS and MDU algorithms. This is because when the storage capacity of edge servers increases, the probabilities of resource contention among service providers decreases. With lower resource contention, the service providers can select the high cost-effective edge servers to cache service instances, and thereby greatly decreasing the social cost. Finally, we can further observe from the Fig. 6(a) that the total collaboration cost and the server latency of ILSCS scheme are lower than these of COALITION, MDU, MCS and RANDOM four algorithms. The main reason is discussed in The impact of the number of edge servers.

Fig. 6
figure 6

The impact of the storage capacities of edge servers (a) The total collaboration cost. (b) The average services latency

Conclusions and future work

In this paper, we investigate the service caching problem with resource sharing among multiple service providers in resource constrained edge environment. To address this problem, we first construct system model, resource sharing model by multiple service providers, cost model for service provider, service latency model and utility model, respectively. Then we formulate the service caching problem with resource sharing among multiple service providers. Next, we adopt a stateless Q-learning algorithm to learn an optimal service caching policy. Finally, to validate the effectiveness of our proposed ILSCS scheme, we implement COALITION, RANDOM, MDU and MCS four baseline algorithm, and compare the total collaborative cost and the service latency of our proposed ILSCS scheme to these of four baseline algorithms under different experimental parameter settings such as the size of service instance, the number of service instances, the number of edge servers, and the storage capacity of edge server. The extensive experimental results demonstrate the ILSCS scheme can achieve lower the service cost and the service latency.

In our futher work, we will further investigate the caching problem of service instances with fault tolerance when some edge servers fail.

Availability of data and materials

(a statement on how any datasets used can be accessed). 1)The experiment data supporting this experiment analysis are from previously reported studies, which have been cited. 2)The experiment data used to support the findings of this study are included within the article. 3)The experiment data are described in Experimental evaluation in detail.

References

  1. Wang T, Mei Y, Jia W, Zheng X, Wang G, Xie M (2020) Edge-based differential privacy computing for sensor–cloud systems. J Parallel Distrib Comput 136:75–85. https://doi.org/10.1016/j.jpdc.2019.10.009

    Article  Google Scholar 

  2. J. Xu, L. Chen, and P. Zhou (2018) “Joint service caching and task offloading for mobile edge computing in dense networks,” arXiv

  3. S. Li, L. Da Xu, and S. Zhao (2018) “5G Internet of Things: A survey A R T I C L E I N F O,” J. Ind. Inf. Integr. 10:1–9. Available: https://doi.org/10.1016/j.jii.2018.01.005

  4. Zhang, Junna, Jiawei Chen, et al.  (2023) “Dependent Task Offloading Mechanism for Cloud–Edge-Device Collaboration.” J Netw Comput Appl.  216:103656, https://doi.org/10.1016/j.jnca.2023.103656

  5. Zhang, Junna, Xiaoyan Zhao, et al.  (2022) “A Composite Service Provisioning Mechanism in Edge Computing.” Mobile Inf Syst. 2022:1–16, https://doi.org/10.1155/2022/9031201

  6. Xia X, Chen F, He Q, Cui G, Lai P, Abdelrazek M, Grundy J, Jin H (2020) Graph-based data caching optimization for edge computing. Future Gener Syst 113:228–239. https://doi.org/10.1016/j.future.2020.07.016

    Article  Google Scholar 

  7. Xia X, Chen F, He Q, Grundy J, Abdelrazek M, Jin H (2020) Online collaborative data caching in edge computing. IEEE Trans Parallel Distrib Syst 32(2):281–294. https://doi.org/10.1109/TPDS.2020.3016344

    Article  Google Scholar 

  8. B. Huang, X. Liu, Y. Xiang, D. Yu, S. Deng and S. Wang,  (2022) “Reinforcement learning for cost-effective IoT service caching at the edge. ”J Parallel Distributed Comput. 168  https://doi.org/10.1016/j.jpdc.2022.06.008

  9. Xia X, Chen F, Grundy J, Abdelrazek M, Jin H, He Q (2022) Constrained app data caching over edge server graphs in edge computing environment. IEEE Trans Serv Comput 15(5):2635–2647. https://doi.org/10.1109/TSC.2021.3062017

    Article  Google Scholar 

  10. Du B (2021) Mobile edge computation induced caching strategy for huge online education with college teachers and students. Internet Technol Lett 4(1):2–7. https://doi.org/10.1002/itl2.208

    Article  Google Scholar 

  11. Rim M, Kang CG (2020) Content Prefetching of Mobile Caching Devices in Cooperative D2D Communication Systems. IEEE Access 8:141331–141341. https://doi.org/10.1109/ACCESS.2020.3012442

    Article  Google Scholar 

  12. Qi K, Han S, Yang C (2019) Learning a Hybrid Proactive and Reactive Caching Policy in Wireless Edge under Dynamic Popularity. IEEE Access 7:120788–120801. https://doi.org/10.1109/ACCESS.2019.2936866

    Article  Google Scholar 

  13. Wang W, Lan RN, Gu JX, et al (2017) Edge caching at base stations with device-to-device offloading. IEEE Access 5:6399–6410. https://doi.org/10.1109/ACCESS.2017.2679198

  14. Ahani G, Yuan D (2020) “Optimal scheduling of content caching subject to deadline.” arXiv. 1:293–307. https://doi.org/10.1109/ojcoms.2020.2978585

    Article  Google Scholar 

  15. C. K. Kim, T. Kim, A. Cho, and S. K. Lee (2020) “Delay-aware distributed caching scheme in edge network,” Conex. 2020 - Proc. 16th Int. Conf. Emerg. Netw. Exp. Technol. 544–545, https://doi.org/10.1145/3386367.3431664

  16. Gu H, Wang H (2020) A Distributed Caching Scheme Using Non-Cooperative Game for Mobile Edge Networks. IEEE Access 8:142747–142757. https://doi.org/10.1109/ACCESS.2020.3009683

    Article  Google Scholar 

  17. M. Kim, H. Cho, Y. Cui, and J. Lee  (2020) “Service Caching and Computation Resource Allocation for Large-Scale Edge Computing-Enabled Networks,” 2020 IEEE Glob. Commun. Conf. GLOBECOM 2020 - Proc.,  https://doi.org/10.1109/GLOBECOM42002.2020.9322297

  18. Ren Y (2021) Game theory based cooperative caching strategy in information-centric networking. Internet Technol Lett 4(1):2–5. https://doi.org/10.1002/itl2.160

    Article  Google Scholar 

  19. Song, Jiongjiong, et al.  (2017) “Learning Based Content Caching and Sharing for Wireless Networks.” IEEE Transactions on Communications. 1–1, https://doi.org/10.1109/tcomm.2017.2713384

  20. Lu S et al (2022) A Dynamic Service Placement Based on Deep Reinforcement Learning in Mobile Edge Computing. Network. 2:106–122. https://doi.org/10.3390/network2010008

    Article  Google Scholar 

  21. Chen Y et al (2023) A Distributed Game Theoretical Approach for Credibility-Guaranteed Multimedia Data Offloading in MEC. Inf Sci. 644:119306.  https://doi.org/10.1016/j.ins.2023.119306

    Article  Google Scholar 

  22. Liang J, Ma B, Feng Z, Huang J (2023) Reliability-aware Task Processing and Offloading for Data-intensive Applications in Edge Computing. IEEE Trans Netw Serv Manage. https://doi.org/10.1109/TNSM.2023.3258191

    Article  Google Scholar 

  23. Chen Y, Hu J, Zhao J, Min G. QoS-Aware Computation Offloading in LEO Satellite Edge Computing for IoT: A Game-Theoretical Approach[J]. Chinese Journal of Electronics. https://doi.org/10.23919/cje.2022.00.412

  24. Chen, Ying, Jie Zhao, Jintao Hu, et al.  (2023) “Distributed Task Offloading and Resource Purchasing in Noma-Enabled Mobile Edge Computing: Hierarchical Game Theoretical Approaches.” ACM Transactions on Embedded Computing Systems. https://doi.org/10.1145/3597023

  25. Zhang J, Chen D, Yang Q et al (2023) Proximity Ranking-Based Multimodal Differential Evolution. Swarm and Evolutionary Computation. 78:101277. https://doi.org/10.1016/j.swevo.2023.101277

    Article  Google Scholar 

  26. Z Xu, L Zhou, S Chi-Kin Chau, W Liang, Q Xia, P Zhou  (2020) “Collaborate or Separate? Distributed Service Caching in Mobile Edge Clouds.”  IEEE INFOCOM.  2020-July, 3:2066–2075. https://doi.org/10.1109/INFOCOM41043.2020.9155365

  27. Zhang X et al (2022) Joint Edge Server Placement and Service Placement in Mobile-Edge Computing. IEEE Internet of Things Journal. 9:11261–11274. https://doi.org/10.1109/jiot.2021.3125957

    Article  Google Scholar 

  28. Y. Chen, W. Gu, J. Xu, Y. Zhang and G. Min, "Dynamic task offloading for digital twin-empowered mobile edge computing via deep reinforcement learning," in China Communications, https://doi.org/10.23919/JCC.ea.2022-0372.202302.

  29. Liu F, Huang J, Wang X (2023) Joint Task Offloading and Resource Allocation for Device-Edge-Cloud Collaboration with Subtask Dependencies. IEEE Transactions on Cloud Computing. https://doi.org/10.1109/TCC.2023.3251561

    Article  Google Scholar 

  30. Huang J, Wan J, Lv B, Ye Q, Chen Y (2023) Joint Computation Offloading and Resource Allocation for Edge-Cloud Collaboration in Internet of Vehicles via Deep Reinforcement Learning. IEEE Syst J 17(2):2500–2511. https://doi.org/10.1109/JSYST.2023.3249217

    Article  Google Scholar 

  31. Y. Chen, J. Zhao, Y. Wu, J. Huang and X. S. Shen, (2022) "QoE-Aware Decentralized Task Offloading and Resource Allocation for End-Edge-Cloud Systems: A Game-Theoretical Approach," in IEEE Transactions on Mobile Computing.  https://doi.org/10.1109/TMC.2022.3223119

  32. Jiang W, Feng G, Qin S, Yum TSP, Cao G (2019) Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks. IEEE Trans Wirel Commun 18(3):1610–1622. https://doi.org/10.1109/TWC.2019.2894403

    Article  Google Scholar 

  33. Kapetanakis S, Kudenko D, Strens MJA (2003) Reinforcement learning approaches to coordination in cooperative multi-agent systems. Lect. Notes Artif. Intell. (Subseries Lect. Notes Comput. Sci. 2636:18–32. https://doi.org/10.1007/3-540-44826-8_2

    Article  MATH  Google Scholar 

  34. Korbut AA, Sigal IK (2010) Exact and greedy solutions of the knapsack problem: The ratio of values of objective functions. J Comput Syst Sci Int 49(5):757–764. https://doi.org/10.1134/S1064230710050102

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

(details of any funding received). Zhejiang Provincial National Science Foundation of China, LY23F020015. National Natural Science Foundation of China, 61903109.

Author information

Authors and Affiliations

Authors

Contributions

Binbin Huang, Ziqi Ran and Yuanyuan Xiang wrote the main manuscript text and prepared all figures - original draft. Dongjin Yu and Xiaoying Shi validated the manuscript. Zhongjin Li, Zhengqian Xu supervised to complete this work on time.

Corresponding author

Correspondence to Dongjin Yu.

Ethics declarations

Ethics approval and consent to particpate

(applicable for both human and/ or animal studies. Ethical committees, Internal Review Boards and guidelines followed must be named. When applicable, additional headings with statements on consent to participate and consent to publish are also required). This declaration is not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, B., Ran, Z., Yu, D. et al. Stateless Q-learning algorithm for service caching in resource constrained edge environment. J Cloud Comp 12, 132 (2023). https://doi.org/10.1186/s13677-023-00506-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-023-00506-7

Keywords