Skip to main content

Advances, Systems and Applications

Optimization model for vehicular network data queries in edge environments

Abstract

As the Internet of Vehicles advances, the demand for timely data acquisition by vehicle users continues to escalate, albeit confronted with the challenge of excessive data retrieval latency. The emergence of edge computing provides technical support for the development of vehicular networks by caching data in advance to reduce data acquisition latency. Therefore, how to effectively cache and query data becomes a key issue in addressing the timeliness of data acquisition in vehicular networks. In this paper, we investigate an efficient query optimization model to minimize data acquisition latency. Firstly, based on the distribution of data query frequencies across different servers, we propose an edge collaborative caching strategy using a tabu search algorithm. This strategy prioritizes high-traffic data by finding two optimal storage nodes for each high-traffic data in descending order of data popularity, ensuring a backup within the collaborative domain for each data segment. This not only reduces data transmission latency between nodes during task execution but also prevents single-point failures. Secondly, we deploy cuckoo filters on edge nodes to enable rapid localization of cached data nodes when users query data, thus reducing data processing latency. Finally, simulation results demonstrate that the proposed query optimization model outperforms other schemes in terms of average data query latency.

Introduction

With the rapid advancement of information technology, an increasing number of intelligent vehicle applications such as autonomous driving and augmented reality vehicles are continuously emerging [1]. These emerging applications not only require a significant amount of data for task execution but also impose higher demands on the timeliness of data acquisition [2]. In the traditional cloud storage model, vehicles often experience significant delays in accessing cloud data due to the considerable distance between cloud servers and vehicles [3]. To address this issue, efficient data sharing in vehicular networks has become a focal point of research. Many emerging technologies are being employed in vehicular network research, including blockchain technology and edge computing. Blockchain, as an emerging technology, encompasses various fields such as distributed systems and the Internet of Things (IoT) [4, 5]. It offers a promising solution to the data sharing problem [6]. However, blockchain typically requires some time to achieve consensus, resulting in a certain degree of latency in data acquisition. Consequently, many researchers have introduced edge computing technology into vehicular network applications. Edge computing utilizes edge servers to deploy computing and storage resources at the edge of vehicles, thereby providing rapid computing and storage services, effectively reducing service latency [7]. However, due to the limited storage capacity of edge servers and the lack of unified management for dispersedly stored data, rapid data retrieval has become a constraint on the development of edge computing. Therefore, the design of efficient edge collaboration caching and querying mechanisms is of significant importance in driving the advancement of edge computing.

Currently, the application of edge computing technology provides technical support for the rapid data acquisition in vehicular networks. Vehicle edge computing relocates data processing and storage to edge devices near vehicle users, effectively reducing data delivery latency [8, 9]. Abani et al. [10] proposed a popularity-based caching scheme to reduce delivery latency by caching highly popular content. However, due to the lack of collaboration between servers, the cache hit rate is not high. Therefore, several researchers have proposed various edge collaborative caching strategies. For instance, Hou et al. [11] introduced a collaborative caching algorithm based on transfer learning (LECC), which estimates content popularity using transfer learning and designs cache optimization models to reduce transmission costs and improve user service experience quality. Jiang et al. [12] investigated the collaboration mechanism among edge servers, modeling the collaborative caching problem as an integer linear programming problem and proposing a subgradient-based algorithm to solve it. Zhang et al. [13] presented a cooperative edge caching method based on deep reinforcement learning, allowing distributed edge servers to learn to cooperate, promoting collaboration between edge servers, and improving system hit rate. Bilal et al. [14] proposed a cooperative federated caching and operation policy, reducing perceived latency for users and content delivery network costs. However, in practical applications, the distance between servers also affects transmission latency. Most of the aforementioned works overlook the impact of server distance on content delivery latency. Addressing this issue, Chen et al. [15] proposed a cooperative composite caching strategy in a cluster-centric small cell network, grouping small base stations (SBSs) into disconnected clusters and caching data within these groups. Chen et al. [16] introduced a joint resource allocation and caching placement scheme based on location-aware multi-user mobile edge computing, jointly optimizing binary caching placement, edge computing resources, and bandwidth (BW) allocation to minimize the expected total energy consumption. Shu et al. [17] proposed a cooperative caching strategy based on group behavior and popularity prediction, considering the distribution of users and the spatial relationship between servers, effectively improving cache hit rate and reducing average access latency .Unlike existing research, this paper considers the non-uniform distribution of data query frequencies across different servers, aiming to reduce transmission latency while preventing cache single point failures.

In addition to reducing transmission time through prefetching to enhance the timeliness of data acquisition, it is imperative to minimize query processing latency. When querying data, it is necessary to first determine the edge server where the information is stored and then transmit the cached hit data from the nearest edge server to the user. Therefore, efficient data querying is crucial for reducing data query processing latency. Wang et al. [18] leveraged the advantages of Named Data Networking (NDN) to reduce latency in vehicular data collection and improve the efficiency of vehicular content retrieval. Duarte et al. [19] proposed a Vehicular Named Data Networking (VNDN) framework, which, in addition to original forwarders, selects an additional set of vehicles to improve data retrieval success rate. Zhang et al. [20] designed a blockchain-incentivized Device-to-Device (D2D) and Mobile Edge Computing (MEC) caching system, effectively offloading traffic to local caches by maintaining high willingness of cache nodes to share, thus reducing content retrieval latency. Most of the aforementioned studies focus on improving retrieval success rate to reduce query latency. However, in edge collaborative caching, due to the dispersed storage of data across different nodes within the collaborative domain, it becomes challenging to rapidly determine the storage nodes for data and content queries. To address this issue, Zhang et al. [21] proposed a task-based edge node selection algorithm, selecting appropriate target edge nodes while employing Bloom filters to filter out malicious nodes, thereby minimizing latency for optimized target edge nodes. Monga et al. [22] introduced a federated indexing model, utilizing reliable fog devices as super-peer overlays to monitor edge resources, employing Bloom filters to provide joint metadata indexing, data localization within 2 hops, and maintaining approximate global statistical information about edge reliability and storage capacity. Quan et al. [23] designed a Bloom filter-based lookup mechanism that can quickly determine whether the required information exists on a specific edge server. While these solutions mostly utilize Bloom filters to locate or index data cache positions, they overlook the inability to remove data once inserted into Bloom filters. Consequently, when there are significant changes in data popularity, unnecessary cached data cannot be removed.

In summary, existing models mostly focus on collaborative caching mechanisms based on data popularity or user preferences to reduce data transmission latency. However, they overlook the impact of the distribution of data query frequencies within the collaborative domain on data transmission latency and the issue of single point failures in edge nodes. Additionally, the performance of data query execution on cached data at edge nodes also affects data retrieval latency. Existing query schemes in edge environments mainly consider query mechanisms to reduce data query latency but neglect the problem of rapid localization query and deletion of dispersed stored data. To address these issues, this paper proposes an optimization model for data query in vehicular networks under edge environments. Leveraging the characteristics of edge computing, this solution conducts in-depth exploration of edge collaborative caching. In the proposed model, a tabu search algorithm is utilized to select the optimal 2 storage nodes for each highly popular data, combined with cuckoo filters to ensure the performance of data queries. This effectively reduces data query latency while preventing single point failures within the collaborative domain. The main contributions of this paper can be summarized as follows:

  1. (1)

    Proposed an edge collaborative caching mechanism based on tabu search algorithm. This mechanism combines the optimization principles of tabu search algorithm and considers the popularity ranking of vehicular network data within the collaborative domain. It prioritizes caching highly popular data to nearby edge nodes to reduce data transmission distance. For each highly popular data, two optimal caching nodes are selected for storage to improve cache hit rate, prevent single point failures, and reduce data transmission latency.

  2. (2)

    Introduced a query optimization mechanism based on cuckoo filters. Addressing the challenges of lacking unified data management and difficulty in determining query locations in vehicular edge environments, cuckoo filters are deployed on edge nodes and shared within the collaborative domain. This allows vehicle users to quickly determine the location of information and filter uncached data hits. Moreover, cuckoo filters support deletion operations, enabling the removal of corresponding fingerprints directly from the filter when there are significant changes in data popularity and pre-caching is no longer necessary.

  3. (3)

    Conducted simulation experiments using different comparison algorithms to evaluate the proposed algorithms. According to the experimental results, the proposed vehicular network data query optimization model exhibits lower average task latency. It is evident that the proposed data query optimization model can prevent single point failures while reducing data query latency.

The remainder of this paper is organized as follows. “Related knowledge” section will delve into the relevant background knowledge. “Optimization model for vehicular network data query in edge environments” section will provide a comprehensive description of the design of the vehicular network data query optimization model in edge environments. In “Experiment and analysis” section, experimental analysis based on the proposed query optimization model will be conducted, with different experiments demonstrating the correctness and feasibility of the model. Finally, “Conclusion and outlook” section will conclude the paper and outline future work and prospects.

Related knowledge

This section will provide a detailed introduction to the tabu search algorithm and cuckoo filter, which are relevant foundational concepts in the paper. The aim is to facilitate a better understanding of the proposed model.

Tabu search algorithm

Tabu Search is a metaheuristic search algorithm initially proposed by the French mathematician Fred Glover (1986) [24]. It is based on neighborhood search, with recent moves being placed in a tabu list to avoid falling into local optima. Tabu Search not only prevents cycling but also provides diversified exploration in the solution space. The algorithm is highly efficient, often exploring only a small fraction of feasible solutions that contain the optimal solution [25]. Tabu Search algorithm is a metaheuristic search algorithm used to solve combinatorial optimization problems. Its main idea lies in maintaining a tabu list to prevent the search process from being trapped in local optima, thus searching the solution space of the problem in a more global manner [26]. The following are the main concepts of the Tabu Search algorithm:

  1. (1)

    Candidate solution space: For a given combinatorial optimization problem, there exists an objective function to evaluate the quality of candidate solutions. The goal of Tabu Search is to find the optimal or near-optimal solution within the solution space.

  2. (2)

    Objective function: For a given combinatorial optimization problem, there exists an objective function to evaluate the quality of candidate solutions. The objective of Tabu Search is to find the optimal or near-optimal solution within this solution space.

  3. (3)

    Tabu list: Tabu Search introduces a tabu list to store previously explored solutions, preventing the algorithm from revisiting the same solutions. This helps to avoid falling into local optima and encourages the algorithm to explore the solution space more comprehensively.

  4. (4)

    Neighborhood search: Tabu Search generates new candidate solutions by searching within the neighborhood of the current solution. The neighborhood defines the transition from the current solution to other potential solutions. During the search process, the algorithm attempts to move to better solutions within the neighborhood while being constrained by the tabu list.

  5. (5)

    Tabu strategy: When a new solution is discovered, the Tabu Search algorithm decides whether to add this solution to the tabu list based on tabu strategy. This helps prevent the algorithm from cycling or prematurely converging to suboptimal solutions.

  6. (6)

    Stopping criteria: Tabu Search algorithms typically include stopping criteria to determine when to terminate the search process. This could involve reaching a certain number of iterations, meeting specific solution quality standards, or reaching a predetermined time limit, among others.

Cuckoo filter

The Cuckoo Filter (CF) is a fingerprint filter based on cuckoo hashing, which excels in terms of space utilization, operational performance, and implementation complexity compared to most Bloom Filter (BF) enhancement schemes [27]. It adopts a partially key cuckoo hashing method, where each fingerprint is computed using only one hash function to determine its primary mapping location [28]. Leveraging the properties of bitwise exclusive OR operation, any mapping location can be obtained by performing exclusive OR operation between another mapping location and the hash calculation result of the element fingerprint. The fingerprint calculation equation is as follows:

$$\begin{aligned} f_p=fingerprint(x) \end{aligned}$$
(1)
$$\begin{aligned} p_1=hash(x) \end{aligned}$$
(2)
$$\begin{aligned} p_2=p_1\oplus hash(x) \end{aligned}$$
(3)

The simplest form of the cuckoo hash structure is a one-dimensional array structure, where new elements are mapped to two positions in the array using two hash functions [29]. If one of the two positions is empty, the element can be directly placed into it. If both positions are occupied, a random position needs to be selected to free up space, and the element is inserted into that position. The cuckoo hash table consists of bucket arrays, with each item having two candidate buckets determined by hash functions. The following is a description of the insertion, lookup, and deletion operations performed by the cuckoo filter:

(1)Insertion: If one of the two buckets is unoccupied, the new item x is inserted into one of the empty buckets. An example of inserting the new item x into a hash table containing eight buckets is illustrated in Fig. 1, where x can be placed in either bucket 2 or bucket 6. If one of x’s two buckets is empty, the algorithm inserts x into that empty bucket and completes the insertion. As shown in Fig. 1, bucket 6 is empty, so x is inserted into bucket 6.

Fig. 1
figure 1

Illustration of cuckoo hashing [29]

Fig. 2
figure 2

Illustration of the cuckoo hashing insertion process [29]

When both buckets are full, and there is no available space, a candidate bucket is selected, and an element within that bucket is evicted to accommodate the new element. As depicted in Fig. 2, after undergoing cuckoo hashing, if both bucket 1 and bucket 6 are occupied by other items, a candidate bucket (e.g., bucket 6) is chosen. The existing item (in this example, ‘a’) is evicted from this bucket, and the item is reinserted into its alternate location.

(2)Lookup: The process of searching for a cuckoo filter is straightforward. For a given element x, the algorithm first utilizes the aforementioned insertion equation to compute the fingerprint of x and identifies two candidate buckets. Subsequently, the algorithm reads these two buckets: as long as the fingerprint present in either bucket matches the computed fingerprint, the cuckoo filter returns “true”; otherwise, it returns “false”. This procedure ensures false negatives, provided that no bucket overflow occurs.

(3)Delete: Standard Bloom filters do not support deletion operations; therefore, deleting a single item requires rebuilding the entire filter, while counting Bloom filters consume more storage space. Similar to counting Bloom filters, cuckoo filters allow deletion of inserted items by removing the corresponding fingerprint from the hash table. Compared to other similar filters, cuckoo filters offer a more concise implementation. The deletion process itself is intuitive, involving checking the two candidate buckets for the given item; if a fingerprint matches in either bucket, a copy of the matching fingerprint is removed from that bucket.

Optimization model for vehicular network data query in edge environments

This section presents a specific optimization model for vehicular network data queries in edge environments. Firstly, a brief description of the system model is provided. Secondly, the proposed edge collaborative caching scheme is elaborated upon, wherein the edge collaborative caching mechanism is established as the foundation for data query optimization by proactively caching data within the collaborative domain. Lastly, cuckoo filters are deployed on edge servers to filter out invalid data queries, thereby enhancing the performance of data query execution.

System model

This paper proposes an optimization model for vehicular network data queries in edge environments, considering a vehicular network data query system with multiple edge servers, as depicted in Fig. 3. The model comprises three entities: cloud servers, edge servers, and vehicular users. Vehicular users transmit various types of collected information to cloud servers, where all data is stored. Edge servers store a portion of high-heat data. Within the collaboration domain, users query \(\text {m}\) different data contents \(D=\{d_1,d_2,\ldots ,d_m\}\), \({S}_i\) where represents the size of the i-th data. The set of edge servers is denoted as \(ES=\{ES_1,ES_2,\ldots ,ES_n\}\), where each server has a certain storage capacity of \(Q_{j}\). Multiple vehicular users near edge servers initiate requests for different data.

Fig. 3
figure 3

Overall framework of query optimization

The proposed framework consists of two phases, as illustrated in the right half of Fig. 3. The first phase is the data caching phase. Initially, based on data popularity, high-demand data that need to be cached in advance are identified. The tabu search algorithm is then employed to compute the optimal storage matrix for each piece of data (denoted as \(x_{ij}\)). Using \(x_{ij}\), the selected data are pre-cached on the corresponding edge servers, ensuring that the transmission distance is shorter and the latency is reduced when users query this data. In the figure, blue circles represent cloud storage servers, green circles represent edge servers, and yellow circles represent vehicle users. The arrows between these entities indicate possible data caching distributions, with all data backed up in the cloud.

The second phase is the data query phase. Prior to querying the data, Cuckoo filters are deployed on each edge server. Using the storage matrix \(x_{ij}\) obtained from the data caching strategy, we can determine which edge nodes the data to be cached are stored on. Therefore, the fingerprints of the respective data are computed and stored in the corresponding Cuckoo filters. When a user queries the data, the system can quickly locate the cache hit position, thereby avoiding a global search, reducing network traffic, and minimizing query processing latency.

Edge collaborative caching strategy based on tabu search algorithm

Collaborative caching framework

This paper proposes an edge collaborative caching architecture based on the tabu search algorithm, as illustrated in Fig. 4. The architecture primarily consists of three components: the vehicular user layer, the vehicular edge layer, and the cloud layer. Both the edge layer and the cloud layer store vehicular network data collaboratively. Servers within the edge layer cooperatively cache high-heat data to ensure efficient data retrieval for vehicular users.

Fig. 4
figure 4

Edge collaborative caching framework

  1. (1)

    Vehicle user layer: As data requesters, vehicles can access the required data through roadside base stations. The nearest edge server (ES) is accessed by vehicles to obtain the data.

  2. (2)

    Vehicle edge layer: Multiple edge servers form an edge network to provide services to vehicle users. In order to execute tasks efficiently, edge servers can cache some commonly used data, and data sharing among servers is possible.

  3. (3)

    Cloud layer: Compared to edge servers, cloud servers have larger storage resources, which can be considered infinite. When edge servers do not cache the required data blocks, they can request data from the cloud through ES or cellular base stations. Since the cloud storage resources are large enough, we assume that any data required for task execution can be obtained from the cloud.

We address the challenges of high data transmission latency and single point of failure in edge servers during data transmission processes. To mitigate these issues, we propose a caching strategy tailored for edge environments. Leveraging tabu search algorithms, we allocate storage resources and identify collaborative domains involving pairs of edge nodes to minimize the total data transmission latency. This ensures redundancy for each piece of data within the collaborative domain, while ensuring that the data stored on each edge server is not entirely identical. In the event of a server failure, the data stored on the problematic server can be retrieved from backups on other nodes within the collaborative domain, effectively mitigating single point of failure issues. Additionally, we replicate high-demand data twice, enabling vehicle users to access data from the nearest node, thereby reducing inter-node transmission distances and lowering transmission latency.

The implementation process of the collaborative caching strategy

This section will provide a detailed explanation of the implementation of the edge cooperative caching strategy proposed in this paper. Given the limited storage capacity of edge servers and the uneven distribution of data query frequencies across edge nodes within the cooperative domain, the key to reducing data transmission latency lies in devising an effective data caching strategy. This strategy encompasses identifying high-demand data, determining pre-caching data, and selecting the optimal caching nodes.

First, the data within the collaborative domain are ranked according to their popularity, and high-demand data are selected for pre-caching. Next, based on the distribution of query frequencies for these high-demand data across different servers, the tabu search algorithm is employed to select cache nodes within the collaborative caching system. The algorithm identifies the two optimal storage nodes, and the data are simultaneously stored on both of these nodes. The detailed process is as follows:

  1. (1)

    Data popularity calculation

The data popularity is calculated by collecting the query frequencies of each data item within the collaborative domain through an intelligent controller. The popularity of a data item within the collaborative domain is determined by the frequency of access to that content over a specific period. The query frequency of data \({d_i}\) on edge server \({ES_j}\) is represented by Eq. (4):

$$\begin{aligned} fre_{j,i}=\frac{\sum _{j=1}^nQU_{j,i}}{\sum _{j=1}^n\sum _{i=1}^mQU_{j,i}} \end{aligned}$$
(4)

Where \(QU_{j,i}\) represents the number of times users request content \({d_i}\) through edge server \({ES_j}\) , and the denominator represents the sum of the number of accesses to all content within the collaboration domain. The data are ranked from highest to lowest based on their popularity. When the frequency \(fre_{j,i}\) exceeds the threshold \(\alpha\), the data \({d_i}\) should be pre-cached, thereby identifying the high-demand data that need to be cached in advance.

  1. (2)

    Transmission delay calculation

After determining the data that need to be pre-cached, each data item is assigned storage nodes based on its popularity in descending order. The tabu search algorithm is utilized to find the two cache nodes within the collaborative domain that minimize the total transmission delay, considering the query distribution of each high-demand data item. The objective function for minimizing data transmission delay and its constraints are formulated as shown in Eqs. 5, 6, 7 and 8):

$$\begin{aligned} minT_t=\sum \limits _{i=1}^m\sum \limits _{j=1}^nT_{ij}\cdot QU_{j,i}\cdot x_{ij} \end{aligned}$$
(5)
$$\begin{aligned} {\sum \limits _{i=1}^mx_{ij}\le Q_j,\quad 1\le i\le m} \end{aligned}$$
(6)
$$\begin{aligned} {\left\{ \begin{array}{ll}{x_{ij}=1,\quad {d_i~is~store~on~Node_j}}\\ {x_{ij}=0},\quad {d_i~is~not~store~on~Node_j} \end{array}\right. } \end{aligned}$$
(7)

Selecting 2 storage centers from among the nodes:

$$\begin{aligned} {\sum \limits _{j=1}^nx_{ij}=2\quad (i=1,2,3\ldots m)} \end{aligned}$$
(8)

Where \({T_{t}}\) denotes the total transmission delay of tasks, \({T_{ij}}\) represents the unit transmission delay of data di from the edge cache node to the querying node j, and \({Q_j}\) represents the storage capacity of edge node j. The objective function (5) aims to minimize the total transmission latency of query tasks within the collaborative domain, while constraint (6) ensures that the quantity of storage resources does not exceed the available capacity of storage space in collaborative nodes. Constraint (8) specifies that the number of storage nodes to be identified is 2.

  1. (3)

    Collaborative caching algorithm

Based on the objective function and constraints, the proposed tabu search algorithm iteratively performs the following process until the predetermined maximum number of iterations is reached, yielding the optimal caching configuration \(x_{ij}\). The process involves defining the objective function, constraints, and the maximum number of iterations. When data \(d_{i}\) needs to be stored, an initial solution is generated by randomly placing \(d_{i}\) on two different nodes. During the neighborhood search, a new storage path \(x_{ij}^{*}\)is obtained by randomly updating one of the storage nodes. The transmission delay \(T_{t}(x_{ij}^{*}\)), ) for the new storage configuration is computed, and \(x_{ij}^{*}\) is added to the tabu list to avoid revisiting previously explored solutions, thereby enhancing search efficiency. The data transmission delay before and after the update is compared. The iteration continues until the maximum number of iterations is reached, resulting in the optimal storage path \(x_{ij}\). The pseudocode for this process is outlined in Algorithm 1.

figure a

Algorithm 1 Data caching strategy based on tabu search algorithm

Query optimization mechanism based on cuckoo filters

Query optimization framework

Query optimization mechanisms further enhance the collaborative caching strategy. Building upon the caching strategy, the deployment of cuckoo filters enables rapid location of data storage nodes. Additionally, cuckoo filters can remove inserted data, allowing for updates when cached vehicle data changes, thus ensuring the validity of queries. These factors collectively work together to improve efficiency and reduce data query processing latency in vehicular networks. The query optimization algorithm based on cuckoo filters consists of two stages: data insertion and data querying. The specific implementation process is as follows:

  1. (1)

    Data insertion stage

First, deploy empty cuckoo filters on the edge nodes within the collaborative domain. Based on the \(x_{ij}\) matrix derived from “Edge collaborative caching strategy based on tabu search algorithm” section, insert the fingerprints of the data into the corresponding cuckoo filters on the edge servers. The cuckoo filter uses two hash functions and two hash buckets, where data can be stored either in bucket T1 or bucket T2, but not in both simultaneously. The insertion process involves computing the fingerprint \(\delta {x}\) for the data item \(d_{x}\), determining its position \(p_{1}\) in the first hash bucket T1 using the formula \(p_1=h_1(x)\), and determining its position \(p_{2}\) in the second hash bucket T2 using \(p_2=p_1\oplus h_x(\delta _x)\). If either \(p_{1}\) or \(p_{2}\) is empty, \(\delta {x}\) is randomly inserted into the available slot. If both \(p_{1}\) and \(p_{2}\) are occupied, an existing fingerprint \(\delta _{m}\) is evicted from one of the positions and reinserted into another slot, while \(\delta {x}\) is placed in the vacated position. This process is repeated until all evicted fingerprints find available slots. The specific procedure for this insertion algorithm is detailed in Algorithm 2.

figure b

Algorithm 2 Insert(\(d_x\))

  1. (2)

    Data querying stage

When a user queries data \(d_{x}\), the Cuckoo filter on the edge server is activated. It first computes the fingerprint \(\delta {x}\) of \(d_{x}\) and identifies the corresponding buckets \(p_{1}\) and \(p_{2}\). The filter then checks whether \(\delta {x}\) is present in either \(p_{1}\) or \(p_{2}\). If \(\delta {x}\) is found in either bucket, the filter outputs “True,” indicating that the edge node stores the data \(p_{2}\). If \(\delta {x}\) is absent from both \(p_{1}\) and \(p_{2}\), the filter outputs “False,” indicating that the node does not store \(d_{x}\). This approach avoids the need to perform a full traversal of the server’s data. The detailed process of the query algorithm is presented in Algorithm 3.

figure c

Algorithm 3 Lookup(\(d_x\))

The cuckoo filter facilitates the filtering of invalid queries for data that is not cached within the collaborative domain, allowing such data to be retrieved directly from the cloud without the need for further data traversal within the collaborative domain to verify its existence. The execution path for data querying under collaborative caching is illustrated in Fig. 5, which delineates three scenarios for cache hits as follows:

  1. 1)

    Local cache hit: Data is transmitted from the local edge server \({Node}_0\) to the relevant user.

  2. 2)

    Cache hits on other nodes: Data is transmitted from the server \({Node}_j\) to the local edge server \({Node}_0\), and then to the user.

  3. 3)

    Cache miss: Data is not cached within the collaboration domain and is fetched from the cloud storage center via the core network.

Fig. 5
figure 5

Edge collaborative caching framework

The unit transmission latency calculation for each of these scenarios is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} \quad \quad \quad \quad 0,\quad local\ cache \ hit\\ h_{j*}\cdot s_{k}\cdot \sigma _{1},\ \ cache\ hits \ on\quad other\ nodes\\ h_{j*}\cdot s_{k}\cdot \sigma _{2},\ \ cache\ miss \end{array}\right. \end{aligned}$$
(9)

Where \({h_{j*}}\) represents the minimum number of hops from other cache-hit nodes within the collaboration domain to the local server.\({\sigma }_{1}\) denotes the unit data latency coefficient between other nodes within the collaboration domain. \({\sigma }_{2}\) signifies the unit data latency coefficient between the cloud center and the edge server.

The Cuckoo Filter supports the operation of deleting fingerprints. As the popularity of data changes, if the stored data’s popularity decreases and pre-caching is no longer necessary, the corresponding fingerprint for that data can be removed from the Cuckoo Filter. Simultaneously, the data stored in the server is also deleted. This action frees up storage space for newly inserted high-popularity data, ensuring the effectiveness of data caching.

Query latency computing

Edge servers and remote clouds can offer various data processing services such as image processing, data integration, speech recognition, etc., to handle data. Each edge server \(({ES_j})\) possesses the capability to process query tasks, while the cloud server \(({Sever_c})\)can handle all query tasks. The processing latency for querying data from the cloud is denoted as \(T_{e}^{c}\). Let \({C_j}\)represent the computational resources of \({ES_j}\), and task represent a query \({task=\{k_1,~k_2,~\cdots ,~k_m\}}\). Then, the processing latency for query task \({k_i}\) within the collaborative domain is given by:

$$\begin{aligned} T_e^{k_i}=\frac{C_{k_i}}{C_j} \end{aligned}$$
(10)

Where \(\mathrm {C_{k_{i}}}\) represents the computing resources that server \({ES_j}\) provides to task \({k_i}\) when querying data \({d_i}\), and \({C_j}\) is the computing resources allocated to task \(({k_i})\) by server \({ES_j}\) per second. When there is a cache hit, the query latency of task can be expressed as the sum of processing latency and transmission latency, as shown in Eq. (11).

$$\begin{aligned} T_q^{k_i}=T_e^{k_i}+T_t^{k_i} \end{aligned}$$
(11)

The query latency of task \({k_i}\) when there is a cache miss can be expressed as:

$$\begin{aligned} T_q^c=T_e^c+T_t^{k_i} \end{aligned}$$
(12)

The average query latency is as follows:

$$\begin{aligned} T_{avg}=\frac{\sum _{i=1}^m\sum _{j=1}^nT_q^{k_i}+\sum _{i=1}^mT_q^c}{\sum _{j=1}^n\sum _{i=1}^mQU_{j,i}} \end{aligned}$$
(13)

Where \({T_q}\) represents the sum of query latencies for all tasks within the collaborative domain, and \(\sum _{{\mathrm {j=1}}}^{{\textrm{n}}}\sum _{{\mathrm {i=1}}}^{{\textrm{m}}}QU_{j,i}\) denotes the sum of the number of queries from all nodes within the collaborative domain for different tasks.

In the event of a cache hit within the collaborative domain or local cache, the server that experiences the hit first will transfer the data content to the local edge server connected to the user, which then promptly delivers the content to the user. However, if the data is already cached in the local server, it is transmitted directly from the local server to the nearby user. Therefore, in this study, we do not consider the transmission latency between the local server and the user. Thus, the core of the problem lies in minimizing the transmission latency among edge servers within the collaborative domain and from the cloud to the edge servers, as well as reducing the query processing latency for data query tasks. By addressing these issues, we can effectively enhance system performance and ensure that users can access the required data more quickly.

Experiment and analysis

Experimental settings

The model was developed using Python 3.6,to evaluate the proposed model,the impact of various influencing factors on average query latency was analyzed. The variables considered include the maximum number of hops between nodes (\({h_{max}}\)), edge server storage capacity (C), the number of nodes (N), and the number of user query content. By examining how these variables affect average query latency, the overall performance of the model was assessed.

This paper evaluates the performance of a query optimization model through simulation experiments, comparing it with two other classic caching strategies: the Random Caching (RC) strategy and the Partitioned Group Caching Strategy (CCS-AGP) [30]. CCS-AGP partitions and groups edge nodes, considering the similarity of request data and the distance between edge servers. However, CCS-AGP does not consider the distribution of data query frequencies within the collaborative domain. Moreover, partitioning and grouping caching may lead to redundant caching of high-demand data in the collaborative domain, resulting in storage capacity wastage. Additionally, storing some data only once may pose a single point of failure issue. The Random Caching strategy involves randomly storing data within the collaborative domain servers without considering data popularity or query distribution patterns.

Table 1 Simulation parameters

Each collaborative domain contains a varying number of servers, and as the number of servers increases, the maximum hop count within the collaborative domain also increases. Our experimental analysis focuses on the data query latency under different numbers of servers. Over a certain period, the total number of data queries is set to 100,000 times. For the sake of simplicity without compromising generality, we assume that each data block has different content but the same size. Specifically, each data block is assumed to be of size 1 unit. Servers may have different storage capacities, ranging from 2 units to 20 units. The popularity of data and the locations of requesting vehicles for query tasks follow the Zipf distribution. The experimental parameters are summarized in Table 1.

Results and discussion

This section presents the experimental results, and discussion. The objective of this study is to reduce data query latency for vehicular network users in edge environments by introducing an optimized data query model. This model primarily addresses the inherent challenges associated with existing vehicular network data queries by considering the query frequency distribution within a collaborative domain. It aims to solve issues such as long data transmission distances and high query latency, thereby enhancing the timeliness of data retrieval. The first phase involves implementing an edge collaborative caching strategy based on the tabu search algorithm to store high-demand data within the collaborative domain. This approach shortens data transmission distances and reduces transmission latency. The second phase involves deploying a cuckoo filter to improve data query performance and reduce data query processing latency. The integration of these two phases significantly enhances the efficiency of data queries.

Fig. 6
figure 6

Impact of maximum hop count between nodes on query latency

Firstly, we analyze the influence of on the average average task latency(lookup latency) when the number of nodes is N=30 and the capacity is c=10. Here, \({h_{max}}\) represents the maximum hop count when transmitting data between the two farthest nodes within the collaborative domain. A larger \({h_{max}}\) implies a wider coverage area of the collaborative domain, longer transmission distances between nodes, and consequently higher data retrieval latency. As observed in Fig. 6, the proposed model demonstrates the best performance among the comparison strategies. Compared to RC and CCS-AGP, the average data query latency of the proposed model is reduced by 13.65\(\%\) and 3.46\(\%\), respectively.

Next, this paper conducts simulation experiments on the query latency under different node capacities. The simulation results are shown in Fig. 7, illustrating the average task latency of the three algorithms under different capacities when N=30 and \({h_{max}}\)=14. From the figure, it is evident that the average data migration delay of each strategy decreases with the increase in storage capacity. Compared to the RC and CCS-AGP strategies, the proposed scheme reduces the average data query latency by 24.2\(\%\) and 3.13\(\%\), respectively. The larger storage capacity enables the retention of more high-demand data, thereby reducing the need for frequent access to cloud servers, where the latency is significantly higher than that of edge nodes. An effective edge caching strategy better mitigates the data query transmission delay. As shown in the figure, the model proposed in this paper consistently demonstrates superior performance.

Fig. 7
figure 7

The impact of node capacity on transmission latency

In addition to the aforementioned experimental evaluation, this paper also investigates the scalability of the proposed model. Figures 8 and 9 respectively illustrate the data query latency under varying numbers of edge nodes and different quantities of requested content across the three schemes. Figure 8 illustrates the average task latency of the three algorithms under different numbers of nodes. When \({h_{max}}\)=14 and storage capacity is C=10, the influence of the number of nodes on task query latency is observed. For edge collaboration, as the number of nodes within the collaborative domain increases, more data can be cached in edge servers closer to users, thereby reducing data transmission latency. Therefore, we can observe from the graph that the average task latency decreases as the number of nodes increases. When the number of nodes is 5, due to the duplication of data storage to prevent single-point failures in this approach, resulting in a larger amount of redundantly stored data compared to CCS-AGP. Hence, the average task latency of this approach is slightly higher than CCS-AGP when the number of nodes is 5. However, as the number of nodes increases, the amount of redundantly stored data in CCS-AGP increases, and the proposed strategy exhibits lower latency. Compared to RC and CCS-AGP, the average data query latency of the proposed model is reduced by 12.23\(\%\) and 1.89\(\%\), respectively.

Fig. 8
figure 8

The influence of node count on transmission latency

Fig. 9
figure 9

The influence of node count on transmission latency

As shown in Fig. 9, the data query latency is analyzed under the conditions of N=30, c=10, and \({h_{max}}\)=14, with the number of user requests ranging from 500 to 3500. It is evident that as the number of user requests increases, the average transmission delay for all three schemes also increases. However, the proposed scheme demonstrates the optimal query latency. Compared to RC and CCS-AGP, the average data query latency of the proposed scheme is reduced by 22.8\(\%\) and 4.16\(\%\), respectively. Figures 8 and 9 also suggest that the proposed model exhibits a certain degree of scalability and adaptability under varying network conditions and query loads.

Additionally, the performance comparison of the query optimization model proposed in this paper is presented in Table 2. The advantages and disadvantages of the query optimization model are elucidated from four aspects: preventing single point of failure, filtering invalid queries, rapid query localization, and data deletion feasibility. While CCS-AGP leverages the partitioning and grouping collaborative caching concept, which effectively reduces data transmission latency when the similarity of queried data between nodes is high, it overlooks other performance aspects. Our algorithm considers the impact of transmission and querying on data retrieval latency, and simultaneously addresses how to prevent single point of failure in servers while reducing average task latency. Therefore, considering comprehensive performance aspects, the query optimization model proposed in this paper demonstrates superior performance.

Table 2 Performance comparison of different models

Conclusion and outlook

This paper proposes a data query optimization model for edge environments. The model employs a tabu search algorithm to pre-store high-demand data on the optimal nodes within the collaborative domain. This approach enables users to quickly retrieve data from these pre-stored nodes, reducing the storage location distance and thus minimizing data transmission latency. Building upon edge collaborative caching, the model further integrates the deployment of a Bloom filter to rapidly determine the data storage location during queries, thereby reducing query processing latency and improving data retrieval timeliness. Simulation experiments demonstrate that the proposed model is both effective and feasible for optimizing data queries in edge environments.

In future work, the query optimization model proposed in this study will be applied to a broader range of IoT-based smart platforms. This research will delve into key factors affecting data query performance, explore more precise methods for predicting data popularity, and investigate efficient strategies for replacing cached data in response to changes in data popularity. The goal is to enhance cache hit rates and further improve query quality in edge networks. Additionally, improvements will be made to the cuckoo filter to enhance its space utilization, thereby increasing the insertion and query performance of the cuckoo filter.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

  1. Wang J, Jiang C, Zhang K, Quek TQ, Ren Y, Hanzo L (2017) Vehicular sensing networks in a smart city: principles, technologies and applications. IEEE Wirel Commun 25(1):122–132

    Article  Google Scholar 

  2. Cheng X, Zhang R, Yang L (2018) Wireless toward the era of intelligent vehicles. IEEE Internet Things J 6(1):188–202

    Article  Google Scholar 

  3. Xu S, Liu X, Guo S, Qiu X, Meng L (2021) MECC: a mobile edge collaborative caching framework empowered by deep reinforcement learning. IEEE Netw 35(4):176–183

    Article  Google Scholar 

  4. Li T, Chen Y, Wang Y, Wang Y, Zhao M, Zhu H, Tian Y, Yu X, Yang Y (2020) Rational protocols and attacks in blockchain system. Secur Commun Netw 2020:1–11

    Google Scholar 

  5. Chen Y, Yang X, Li T, Ren Y, Long Y (2022) A blockchain-empowered authentication scheme for worm detection in wireless sensor network. Digit Commun Netw 10(2):237–508

  6. Tan C, Chen Y, Ren X, Peng C (2023) A mobile energy trading scheme based on lightning network. Concurr Comput Pract Experience 35(20):e6623

    Article  Google Scholar 

  7. Wang G, Li C, Huang Y, Wang X, Luo Y (2022) Smart contract-based caching and data transaction optimization in mobile edge computing. Knowl-Based Syst 252(109):344

    Google Scholar 

  8. Remy SL, Gajananan K, Karve A (2022) Redefining edge computing. In: 2022 IEEE Cloud Summit, IEEE, pp 113–117

  9. Hou L, Lei L, Zheng K, Wang X (2018) A q-learning-based proactive caching strategy for non-safety related services in vehicular networks. IEEE Internet Things J 6(3):4512–4520

    Article  Google Scholar 

  10. Abani N, Farhadi G, Ito A, Gerla M (2016) Popularity-based partial caching for information centric networks. In: 2016 Mediterranean Ad Hoc Networking Workshop (Med-Hoc-Net), IEEE, pp 1–8

  11. Hou T, Feng G, Qin S, Jiang W (2018) Proactive content caching by exploiting transfer learning for mobile edge computing. Int J Commun Syst 31(11):e3706

    Article  Google Scholar 

  12. Jiang W, Feng G, Qin S (2016) Optimal cooperative content caching and delivery policy for heterogeneous cellular networks. IEEE Trans Mob Comput 16(5):1382–1393

    Article  Google Scholar 

  13. Zhang Y, Feng B, Quan W, Tian A, Sood K, Lin Y, Zhang H (2020) Cooperative edge caching: a multi-agent deep learning based approach. IEEE Access 8:133212–133224

    Article  Google Scholar 

  14. Bilal K, Baccour E, Erbad A, Mohamed A, Guizani M (2019) Collaborative joint caching and transcoding in mobile edge networks. J Netw Comput Appl 136:86–99

    Article  Google Scholar 

  15. Chen Z, Lee J, Quek TQ, Kountouris M (2017) Cooperative caching and transmission design in cluster-centric small cell networks. IEEE Trans Wirel Commun 16(5):3401–3415

    Article  Google Scholar 

  16. Chen J, Xing H, Lin X, Nallanathan A, Bi S (2022) Joint resource allocation and cache placement for location-aware multi-user mobile-edge computing. IEEE Internet Things J 9(24):25698–25714

    Article  Google Scholar 

  17. Shu P, Du Q (2020) Group behavior-based collaborative caching for mobile edge computing. In: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol 1. IEEE, pp 2441–2447

  18. Wang X (2018) Vehicular cloud construction and content acquisition. IEEE Intell Transp Syst Mag 10(3):135–145

    Article  Google Scholar 

  19. Duarte JM, Braun T, Villas LA (2019) MobiVNDN: A distributed framework to support mobility in vehicular named-data networking. Ad Hoc Netw 82:77–90

    Article  Google Scholar 

  20. Zhang R, Yu FR, Liu J, Huang T, Liu Y (2020) Deep reinforcement learning (drl)-based device-to-device (d2d) caching with blockchain and mobile edge computing. IEEE Trans Wirel Commun 19(10):6469–6485

    Article  Google Scholar 

  21. Zhang P, Zhang A, Xu G (2020) Optimized task distribution based on task requirements and time delay in edge computing environments. Eng Appl Artif Intell 94:103774

    Article  Google Scholar 

  22. Monga SK, Ramachandra SK, Simmhan Y (2019) Elfstore: a resilient data storage service for federated edge and fog resources. In: 2019 IEEE International Conference on Web Services (ICWS), IEEE, pp 336–345

  23. Quan T, Zhang H, Yu Y, Tang Y, Liu F, Hao H (2023) Seismic data query algorithm based on edge computing. Electronics 12(12):2728

    Article  Google Scholar 

  24. Glover F, Hanafi S (2001) Finite convergence of tabu search. In: MIC’2001 - 4th Metaheuristics International Conference, ResearchGate, Porto, p 333–336

  25. Abyazi-Sani R, Ghanbari R (2016) An efficient tabu search for solving the uncapacitated single allocation hub location problem. Comput Ind Eng 93:99–109

    Article  Google Scholar 

  26. Laayati O, Elmaghraoui A, El Hadraoui H, Ledmaoui Y, Bouzi M, Chebak A (2023) Tabu search optimization for energy management in microgrids: A solution to grid-connected and standalone operation modes. In: 2023 5th Global Power, Energy and Communication Conference (GPECOM), IEEE, pp 401–406

  27. Fan B, Andersen DG, Kaminsky M, Mitzenmacher MD (2014) Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, Association for Computing Machinery, New York, p 75–88

  28. Zhao Y, Dai W, Wang S, Xi L, Wang S, Zhang F (2023) A review of cuckoo filters for privacy protection and their applications. Electronics 12(13):2809

    Article  Google Scholar 

  29. Pagh R, Rodler FF (2004) Cuckoo hashing. J Algoritm 51(2):122–144

  30. Zeng F, Zhang K, Wu L, Wu J (2022) Efficient caching in vehicular edge computing based on edge-cloud collaboration. IEEE Trans Veh Technol 72(2):2468–2481

    Article  Google Scholar 

Download references

Funding

This research is supported by Foundation of National Natural Science Foundation of China (62202118), and Top Technology Talent Project from Guizhou Educa- tion Department (Qian jiao ji [2022]073), and Scientific and Technological Research Projects from Guizhou Education Department (Qian jiao ji [2023]003), and Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project (GCC[2023]018), and Guizhou Province Major Project “Researchand Application of Key Technologies for Trusted Big Models for Public Big Data” (Qiankehe Major Project No. [2024]003), and Guizhou Provincial Department of Science and Technology Support Program “Research and Application of Key Technologies in High-Speed Chain + Light Chain Aggregation Platform Based on Blockchain Technology” (Qiankehe Support Project No. [2021]385).

Author information

Authors and Affiliations

Authors

Contributions

Yan Zheng wrote the main manuscript text, Yuxiang Yang, Chang Shu, Lang Chen prepared tables and figures, Yuling Chen and Chaoyue Tan provided helpful suggestions and revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yuling Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Y., Chen, Y., Tan, C. et al. Optimization model for vehicular network data queries in edge environments. J Cloud Comp 13, 145 (2024). https://doi.org/10.1186/s13677-024-00705-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-024-00705-w

Keywords