ES-PPDA: an efficient and secure privacy-protected data aggregation scheme in the IoT with an edge-based XaaS architecture

In an Internet of Things (IoT) system based on an anything as a service (XaaS) architecture, data are uploaded from heterogeneous nodes in a nonstandardized format and aggregated on the server side. Within this data-intensive architecture, privacy preservation is one of the most important issues. In response to this concern, there are numerous privacy-protection data aggregation (PPDA) solutions available for various IoT applications. Because of the limited resources of intelligent IoT devices, traditional PPDA cannot meet practical privacy and performance needs. To tackle this challenge, we provide a more efficient and secure PPDA solution that guarantees data security and integrity through Paillier homomorphic encryption and online/offline signing technology. Detailed security analysis shows that our system is unpredictable under a chosen message attack, and the data integrity may be guaranteed under the assumption of q-strong Diffie-Hellman (Q-SDH). We choose an M/G/1 priority queue model to maximize system performance. M/G/1 enhances queuing efficiency and accelerates channel access, thus reducing waiting time and increasing reliability. The experimental results show that our data aggregation scheme is reliable with low latency.


Introduction
The Internet of Things (IoT) brings together "things" in the physical world. The traditional IoT model is cloudbased, with end devices solely responsible for data collection. Cloud-based subscription models are evolving to offer greater flexibility and efficiency to meet strong business demand. Anything as a service or everything as a service, also known as X as a service (XaaS), is a platform for SaaS, PaaS, and IaaS. XaaS provides built-in services such as software, networks, platforms, security services, and applications. XaaS also offers high cloud storage capacity and dynamic resource allocation, which reduces enterprise burden. XaaS maximizes performance while minimizing latency at peak loads. New XaaS models allow users to lease cloud resources on demand without having to purchase ownership and can be flexibly expanded upon request at a later stage. Cloud-based XaaS can guarantee a uniform format of different data sources after downloading, which is conducive to setting industry standards. XaaS holds great potential for smart services such as smart grids [1], smart healthcare [2], smart cities [3], and vehicle detection. However, due to bandwidth limitations and limited hardware resources, a traditional cloud-based paradigm can be difficult to accommodate, particularly for real-time services. Hence, edge-based XaaS has been increasing [4]. As shown in Fig. 1, data are collected via sensors and routed to edge servers for local aggregation, sharing, and extraction. The data are then forwarded to the cloud for final processing. Edge servers can be considered preprocessing units that deliver efficient local services using a combination of cloud servers. Thus, the consumption of physical resources can be significantly reduced.  11:20 Physical objects connected to an XaaS IoT based on the periphery operate on devices with limited resources and require effective communication protocols to enhance energy efficiency. Traditional internet status transfer protocols, such as REST, utilize event-based frameworks to minimize the number of messages sent. In an edgebased XaaS IoT, heterogeneous applications use these standards and protocols for aggregating edge-side data. Organizations such as OpenIoT, the AllSeen Alliance, and the IPSO Alliance are working to standardize communication protocols to ensure interoperability among vendor islands. The IoT in smart cities focuses on using light protocols, such as CoAP and XMPP, to connect sensor interfaces to physical supports. Organizations such as the IETF and the XMPP are working on expanding CoAP and XMPP. However, these efforts seek to improve protocols rather than provide integrated solutions.
Edge-based XaaS devices are vulnerable due to their distributed nature, which allows more attack paths for both internal and external attackers. Some devices are not completely reliable, revealing user privacy. Put another way, edge-based XaaS undermines the confidentiality, accuracy, and robustness of data aggregation protocols. External attackers can spy on communication channels between the entities involved, alter messages on the network, falsify signatures, or even launch rebroadcasting attacks.
In response to this concern, many privacy protection data aggregation (PPDA) systems have been suggested. Most use homeomorph cryptosystems to implement specific functions, such as summative computation, to ensure the privacy of data in transit. Other measures have been considered to improve the safety of the PPDA, such as dimensional reduction, data integrity verification, and random noise techniques.
However, existing PPDA regimes present practical difficulties.
• Frequent data transfer is normal for an XaaS system based on the edge [4]. When performing live data processing tasks, high communication latency is not acceptable. • Authentication and validation of data sources are essential to prevent attackers from falsifying, altering, and replaying messages and signatures. However, authentication and validation require the support of edge devices, making it difficult to implement them in resource-constrained IoT devices [5].
In what follows, to address the above challenges, efficient and secure privacy-protection data aggregation (ES-PPDA) is suggested. ES-PPDA considers both security and efficiency. In ES-PPDA, the heavy computational costs associated with data integrity operations can be drastically reduced through online/offline signing mechanisms. The main contributions are summarized as follows: • Paillier homomorph encryption and online/offline signing are used to ensure data security and integrity. • A detailed security analysis is presented. • An M/G/1 priority queue model is utilized to optimize system performance. M/G/1 improves queuing efficiency and speeds up channel access, thus reducing wait time and increasing reliability. Experimental results show that ES-PPDA is reliable with low latency The remaining content is organized as follows: Related work section reviews related work; Preliminaries section introduces background information; Proposed scheme section proposes the ES-PPDA scheme; Security analysis section carries out security analysis; Performance optimization section optimizes the scheme; Experiments section conducts performance testing; and Conclusion section concludes.

Related work
Recently, the aggregation of privacy data (PPDA) has received growing attention in areas such as smart grids [6] and vehicle detection systems [7,8]. Some earlier work has considered uploading using homomorph cryptosystems [9,10]. Subsequently, to further enhance the ability to protect privacy, it is necessary to add blind factors in the encryption steps, thus making them resistant to internal attacks [11,12]. To prevent malicious aggregators, hatch hash [13] and random noise technology are introduced into PPDA schemes to ensure data integrity in the encrypted message forwarding process [14]. However, these systems do not consider the cost of designing a cryptosystem [15].
Recently, researchers have focused on reducing the cost of computing cryptographic operations in conventional PPDA systems [16]. By anticipating the demand for electricity in a smart grid system, a light and secure system was proposed. It can effectively meet safety and privacy requirements and further reduce the indirect costs of communications. An improved version for IoT fog computing systems allows multidimensional data to be compressed into composite dummy data and the early injection at the fog nodes can be filtered. Researchers have proposed a system for classifying and aggregating privacy data for vehicle sensor systems that resist data link attacks [17]. A PPDA scheme presuming that several locally authenticating accreditation bodies can be anonymous, a dual trap chameleon hash-based online/offline signing and verification method, was proposed [18]. The scheme proposed in [19] protects data privacy by hiding data transfers between the cloud and edge servers. The scheme proposed in [20] used RSA and RC4 to generate a key, which achieves higher security in the image. Recently, there have been data aggregation optimizations, for example, using a hybrid metaheuristic algorithm, i.e., a whale optimization algorithm (WOA) and simulated annealing (SA) algorithm, to select the optimal CH in an loT network cluster [21]; optimizations based on LSTM models [22]; and reducing network latency by increasing time slots and reducing the power consumption through weighted load balancing [23] (Table 1).

Preliminaries
This section presents several definitions and notations used in ES-PPDA schemes, including bilinear pairings, Paillier homomorph cryptosystems [24], online/offline signing, and security definitions.

Bilinear pairings
G and G T are two cyclic groups, g is the generator of group G, and p is the prime order. e : G × G → G T , satisfying [25]:

Definition 1(q-strong Diffie-Hellman problem (q-SDH)).
x is a random element in Z p * . For g, g x , g x 2 , . . . , g (x q ) , and pair (m, ∑ x ), where m ∈ Z p * . The q-SDH is defined as an (q, t, ε) problem:
Step 5 The public key is (n, g), and the private key is (λ, μ).

Power Consumption
Data Integrity Privacy Protection Step 1 Select a random number r，0 < r < n, and gcd(r, n) = 1.
Step 2 Encrypt: c = g m • r n mod n 2 (c) Decryption Compute m = L(c λ mod n 2 ) • μ mod n

Online/offline signing
The double trapdoor chameleon hash (DTCH) function is usually used for implementing online/offline signing. For the generator g 1 of a prime number p 1 and G p 1 , select two trapdoor keys y, z ∈ Z * p 1 . Then, compute hash: The DTCH function has the following properties [26]: • Computability: For pk ∈ G and the triad (r, o, o) ∈ Z p , H ch (r, s, u) can be computed in polynomial time. • Anti-collision: If a key is missing, two hash pairs (r 1 , s 1 , u 1 ), (r 2 , s 2 , u 2 ) cannot be found such that r 1 ≠ r 2 and H ch (r 1 , s 1 , u 1 ) = H ch (r 2 , s 2 , u 2 ). • Valve collision: Given H ch and (pk, sk), hash pair (r 1 , s 1 , u 1 ) and additional message r 2 ∈ Z p , such that H c h (r 1 , s 1 , u 1 ) = H ch (r 2 , s 2 , u 2 ). First, a random u 2 (or s 2 ) is selected, and the value of u 2 (or s 2 ) can be calculated in polynomial time by Based on the above properties of the DTCH function, online/offline signing can be built using five algorithms: • Setup: When the security parameter 1 λ is entered, it returns a public key Ver pk and a private key Sig sk . • Initiation: Challenger C generates public/private keys (pk, sk) from 1 k . Then, pk is given to Α.
• Sign.off query: The opponent requests and the challenger C replies with off i to the opponent, while the status information St i is stored by itself. Assume that the opponent can make up to q 1 queries at this stage. • Sign.on query: The opponent requests and the challenger C uses the St i to calculate the online signature and then returns on i to the opponent. Assume that the opponent can make a maximum of q 2 queries at this stage. • Forgery: Opponent A generate (m * , ∑ * ) and forward to C. The challenger C checks by Ver on (pk, m * , ∑ * ). If the signature is valid, output 1 (success); otherwise, output 0 (failure).
The existing advantages of forging opponent A's signature are as follows:

Proposed scheme
The notations used in this section are listed in Table 2.

System model
The ES-PPDA system model is shown in Fig. 2 and consists of four components: cloud servers, edge servers Ver on pk, m * , * = 1 : (pk, sk) ← (ESs), smart IoT devices (SDs), trust agencies (TAs) (or other control centers, (CCs)) [28]. A TA boots the entire system and distributes critical information and system parameters. When the configuration is finished, the TA disconnects.
A CC collects data packets from the edge. It then sends responses to the edge server (see steps 10, 11, and 12 in Fig. 2). The CC also offers registration services for the XaaS IoT.
An ES acts as an aggregator that processes encrypted data from an SD, and forwards and communicates between the CC and SD (see steps 8, 9, and 13 in Fig. 2). The ES also performs integrity verification (see steps 4 and 7 in Fig. 2).
An SD collects private data generated by sensors and transmits it to the CC in encrypted form via an ES (see steps 2, 3, 5, and 6 in Fig. 2).
Note: since an SD is typically a resource-constrained device, it cannot effectively carry out computationally complex privacy-protected data aggregation processes, particularly cryptography operations involved in data integrity mechanisms. This has led to the exploration of a lightweight/efficient PPDA optimization that supports edge-based XaaS architectures.

Workflow
The proposed ES-PPDA scheme proceeds as follows:

Security claims
We assume that both the TA and CC are completely trustworthy. The ES may be partly trustworthy, that is, it will not manipulate the sensitive user data, but may reveal personal information in the grouping process. Furthermore, the external opponent A threatens the data integrity and carries out attacks. It can spy on the transmitted data or invade the server in the ES and CC to steal the processed data. The opponent can actively falsify the signature of the data report and further damage the integrity of the data.

Security analysis
In what follows, we examine system security in terms of authentication, confidentiality, and privacy protection.

Authentication
In ES-PPDA, we integrated Schnorr's extended signature method into the recording step, which turned out to be safe under the discrete logarithmic hypothesis. The explanation is as follows: An attacker cannot tamper with recording without knowing the true identifier of SD i , the ID i , because the ID i is obtained using hashing (H i ) and is a secret. Furthermore, although an attacker can steal the true identifier ID i of SD i , it still cannot obtain r i because r i is further hidden by a randomly selected blind factor k i , thus assuring that X i is secure. Therefore, our scheme proves that the SD-CC authentication is secure.

Confidentiality
We use the Paillier cryptosystem to encrypt all sensory data and aggregate encrypted text based on additive homomorphism. Confidentiality is assured based on the following three points.
First, SD i 's private data m i are encrypted as c i = g m i · v i n mod n 2 . The Paillier cryptographic system is semantically secure under CPA based on q-SDH and does not reveal sensitive information.
Second, when aggregating reports, the ES cannot retrieve every individual's full text, and the ciphertext received is aggregated as Therefore, the confidentiality and privacy of user data can be guaranteed even when the ES is not trusted.
Finally, let us assume that an outside attacker could spy on the entire communication channel SD i to the CC and simultaneously obtain a single ciphertext c i , aggregated ciphertext c, and plaintext m but still cannot recover a single plaintext m i . All plaintext is compressed through the process of aggregating reports. In summary, the confidentiality and privacy of each SD i 's private data can be well protected.

Integrity and unforgeability
In the proposed scheme, we develop a signing method that reduces the cost of calculation while ensuring data integrity. Here, we show that our system is robust under a chosen message attack. Because of definition 2, without asking for an Oracle token EZ online signature, an opponent may not counterfeit any probabilistic polynomialtime pair (m * , ∑ * ).

Performance optimization
Suppose we have N heterogeneous sensor nodes within an l × l region (i.e., rectangular industrial subunit). The data captured by the sensors fall into two categories: normal data (ND) and event data (ED). Low priority P 1 nodes generate ND packets, and high priority P h nodes generate ED packets when the value exceeds its threshold. Suppose that each node only supports one type of data, i.e., ND or ED. Similarly, M of N nodes send high-priority packets, namely, P h packets, while the rest send only low-priority packets, namely, P 1 packets. Network topologies are considered static for a certain period. The gateway and cloud center are expected to be connected via broadband wireless links, and latency and packet loss are negligible. Sensor nodes are connected to the channel (CH) aggregator. Nodes, including CH and gateway, have child-parent relationships. All sensor nodes within a single CH compete for the respective parent node access channel for link resources. The data generated from the end node are aggregated to the CH and then forwarded to the gateway. Gateways and CHs are located in specific areas and generally have greater electrical power than sensor nodes. A CH can retrieve application-specific information, including priority and location. The waiting time for each priority depends upon the scheduling policy that the CH has adopted.
The M/G/1 queue method accommodates the randomness of devices for measuring network performance, including throughput, waiting time, packet loss rate, and resource consumption [29]. The M/G/1 queue system with priority may be divided into nonrepetitive and preemptive queue models. For nonrepetitive package planning, when the lower priority package starts to run, the ongoing task continues even though the top priority packet hits the queue. Additionally, the package should wait in the queue until the task for the package is complete. However, in scheduling priority packages, higher priority packages are handled first, and lower priority packages may be preemptive by backing up their context if the task has already been run. We propose to use an M/G/1 to CH priority queue model. Priority data partitioning is built by the application layer taking into account the parameters of the MAC layer depending on industrial requirements and network conditions. IEEE 802.15.4 uses the carrier-sense multiple access with collision avoidance (CSMA/CA) conveyor to access wireless channels. However, it does not suit delayed industrial applications because it does not have priority characteristics and delayed intervention [30]. In industrial IoT systems, flow control, process monitoring, and fault detection subsystems must have media access mechanisms that are sensitive to delays and priorities. Figure 3 shows a sequence diagram of various nodes in competition for channel access depending on the priority of the nodes. All packets in the lower priority queue need not be processed until the higher priority queue is blank. The P h node still has a short, fixed withdrawal period, more frequent channel access detection, and many retreats. However, P l nodes use longer, random withdrawal times, fewer detection frequencies, and shorter withdrawal times. Moreover, the clear channel assessment (CCA) detection time of a P l node is more continuous than the CCA and P h node removal period.
CSMA/CA behavior is influenced by various MAC settings, such as minimum and maximum withdrawal indexes (macMinBE), the maximum withdrawal indexes (macMaxBE), the initial values of competing windows (CW), and the maximum backoffs (macMaxCSMABackoffs). The different values of these MAC settings significantly affect the performance of an IoT network. Instead of having to configure the same CSMA/CA parameter values (i.e., low priority and high priority) for both traffic types, each category may have its attributes assigned  and CW l are defined as the values of low-priority nodes. In addition, by specifying different CSCM/CA parameters, it is possible to implement prioritized scheduling to reduce channel access times for high-priority packets, as shown in Fig. 4. For data aggregation with priority, the M/G/1 queue model with priority maintains the data priority category. Packets with priority have arrival rates of λ i , λ i ∈ {1, 2, …, P}, and follow a Poisson distribution. A lower value of i indicates a higher priority packet type. Within the system model, the priority rule is implemented. This means that the arrival of the i-priority packet immediately precedes the lower priority data and obtains service access.
The wait time for i priority packets W i is the queue time before the CH. The average remaining service time of existing service packets and the CH service time are represented by R i and S i , respectively. The total system delay is given by the sum of the packet's wait time and serve time. Little's law states that the expected wait time for the i-th priority packet is: are the expected service time of the i-th priority packet and the expected system delay in the i-th priority queue, respectively, which are calculated by the following formula: Furthermore, the second moment can be expressed in the following manner: Experiments ES-PPDA's performance is assessed based on the anticipated latency and reliability of the system, which is implemented in MATLAB. The simulation parameters are identified in Table 3. Figure 5 shows the package latency having different prioritization and the quantity of nodes. The latency for high/ low-priority packets increases with the quantity of nodes, as aggregating more packets leads to longer service durations. The latency of low-priority packets is longer than that of high-priority packets because we have to take into account the disruption of all high-priority packets. Additionally, Fig. 6 compares the performance of the proposed priority scenarios against the nonpriority scenarios. Nonpriority regimes show similar curves, but the latency exceeds priority methods. In addition, because of the preferential channel access and the preemptive priority rule, a high-priority packet is free from interference from a lower packet, thereby reducing the expected system time.

System reliability
Our scheme is modeled as a K-size M/G/1 priority queue. Each queue receives packet data frames per second using Poisson's arrival process of λ. The probability of packages being in the queue is: A sensor node may not be capable of sending packets to the CH, including (a) if the buffer is full, (b) if the node cannot find a free channel, or (c) if the packet is thrown past the retry limit. Considering these aspects, the reliability of the system can be calculated as follows: where p k is the probability of the entire buffer with K frames, provided by Eq. (9), p cf is the packet loss resulting from channel access failure, and p cr is the packet collapse resulting from retry. Figure 7 illustrates the relationship between reliability and the node number of the entire system that (9)  is observable and that the reliability of the network increases and diminishes the number of nodes. Due to the node number, each node in the queue congestion problems are conflicts become more frequent, more frequent and packet retransmission. Then, as the queue becomes busier and delayed longer, the possibility of frame loss is also increased due to conflicts, retry constraints, and link constraints. It should be noted that high-priority nodes have greater network reliability than low-priority nodes because of the use of the priority channel planning mechanism and the queue strategy. An IoT network typically involves many sensors for detection. In a high-density IoT network, resourceconstrained end devices may be limited by packet delay  and data conflict. The end devices typically contain various data flows and face various reliability requirements. This paper proposes a cloud-based delay reduction plan using preferential channel access and data aggregation at CHs. Furthermore, the combined effects of packet planning and aggregation are considered using a preemptive M/G/1 queue model. Experimental results have shown that the priority system has significantly decreased the wait time and increased the reliability of the nonpriority system. Then, the network emulator tool was used to analyze system performance in real IoT applications such as e-health and industrial automation. A future IoT network is expected to support a wide range of heterogeneous equipment/sensors in areas such as e-health and industrial control. In high-density deployment scenarios such as an industrial internet system, reliable communication links with low latency are difficult because of the latency of the system involved. Using the information offered by the application, the data from IoT nodes of two types, the high-priority nodes and low-priority nodes, allocate different MAC layer properties to provide priority channel access mechanisms for data processing with the heart of the cloud. Then, before sending the aggregated data to the cloud, using a separate low-priority, high-level queue, the m/G/1 preemptive queue model is adopted. The results show that, in comparison with the nonpriority regime, the basic method proposed in this paper can significantly improve the timing and reliability of an IoT system.

Conclusion
In this paper, we propose a secure and efficient PPDA solution for IoT systems based on an XaaS architecture. Our scheme greatly reduces the time of resource consumption. In addition, by taking advantage of edge computing, ES-PPDA can effectively transfer complex cryptographic operations to ES while minimizing the real-time cost. We select an M/G/1 queuing model to optimize the system performance. This optimization can be applied to an XaaS architecture IoT, for example, a smart grid. Experimental results show that the scheme is unassailable under the security model we defined. Performance evaluation experiments proved that the scheme is lightweight and highly efficient. However, our approach is somewhat vulnerable to malicious users such as ESs. Subsequently, we plan to make our security model robust.