Skip to main content

Advances, Systems and Applications

Privacy-preserving sports data fusion and prediction with smart devices in distributed environment

Abstract

With the rapid advancement of sports analytics and fan engagement technologies, the volume and diversity of physique data generated by smart devices across various distributed sports platforms have grown significantly. Extracting insights and enhancing fan experiences from such data offer considerable benefits. Yet, this process unveils two primary challenges. Firstly, efficiently utilizing the vast datasets in sports analytics is daunting due to the complex nature of the sports industry. Secondly, the data collected from diverse sources and stored in distributed platforms contain sensitive information like fan preferences and athlete performance metrics, posing risks of privacy breaches. To address these challenges, we leverage an advanced Locality-Sensitive Hashing technique, known as PSDFP\(_{\text {ALSH}}\), tailored for the sports domain. This paper presents a new privacy-preserving method for sports data fusion and prediction in distributed environments, utilizing enhanced Locality-Sensitive Hashing to protect sensitive information while maintaining high data utility. Through extensive experimentation, our approach demonstrates superior performance over existing methods in terms of Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and computational efficiency.

Introduction

With the integration of physical activities, cyber technologies, and advanced Internet of Things (IoT) applications, the domain of sports analytics is experiencing significant growth, offering unprecedented opportunities for enhancing athletic performance, fan engagement, and sports management. As a key component of this evolution, Cyber Physical Systems (CPS) are increasingly employed to collect, process, and analyze vast amounts of data generated from various sensors and devices across various domain [1,2,3,4], such as sports venues and athletes.

Among the myriad applications of CPS in sports, enhancing fan experiences at live events, optimizing athletes’ performance through detailed analytics, and personalizing sports marketing strategies are paramount. The ability to fuse and intelligently analyze data from diverse sources can lead to substantial benefits, enabling stakeholders to make informed decisions and provide superior services [5,6,7,8,9].

However, the aggregation and utilization of sensitive data, including athletes’ health information, fans’ personal preferences, and real-time location data, pose significant privacy concerns. Directly analyzing such data from multiple sources without adequate privacy measures could lead to unintended privacy breaches and potential misuse of personal information [10,11,12,13,14,15]. For example, monitoring a fan’s movements within a stadium and their purchasing habits could unintentionally expose private information if not managed carefully. Consequently, there exists an inherent tension between leveraging data to improve sports analytics and ensuring the privacy of individuals involved [16,17,18,19].

To address these challenges, this paper proposes a novel privacy-preserving approach for sports data fusion and prediction, named PSDFP\(_{\text {ALSH}}\), utilizing an amplified Locality-Sensitive Hashing (LSH) technique [20, 21] tailored for the sports domain. This approach aims to balance privacy concerns with the need for high-accuracy analytics and efficient data processing. Our contributions are threefold:

  • We introduce a privacy-preserving data fusion framework for sports analytics, leveraging LSH techniques to transform sensitive high-dimensional data into privacy-preserving low-dimensional indices without sacrificing data utility.

  • We detail the process of aggregating these indices from various data sources, such as wearable sensors and venue IoT devices, to make accurate predictions and provide insights into athletes’ performance and fans’ behaviors.

  • Through extensive experiments with real-world sports data sets, we demonstrate the effectiveness of our approach in providing high-accuracy predictions while ensuring the privacy of individuals, outperforming existing methods in terms of Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and computational efficiency.

The rest of the paper is organized as follows: “Related work” section reviews related work in the field of sports analytics and privacy protection. “Motivation” section describes the motivation behind our approach and formalizes the problem. “A privacy-aware data fusion and prediction approach: PSDFPALSH” section details the methodology of our PSDFP\(_{\text {ALSH}}\) approach. “Experiments” section presents experimental results and comparisons with other methods. Finally, “Conclusion” section concludes the paper and discusses future research directions.

Related work

The intersection of privacy protection, sports data fusion, and prediction has garnered significant attention in recent years. This section reviews the literature on methodologies and technologies developed to address privacy concerns while enhancing the accuracy and utility of sports data analysis.

Privacy Protection in Data Fusion. A novel approach integrating federated learning into multi-sensor data fusion was proposed, showcasing superior convergence performance and effective privacy preservation [22,23,24]. Another study introduced a low-energy data fusion privacy protection algorithm for three-dimensional wireless sensor networks, significantly improving privacy protection and data fusion accuracy [25, 26]. Additionally, a privacy protection strategy based on time slot allocation and relay in Wireless Body Area Networks (WBAN) demonstrated improved transmission success rates and reduced packet loss [27]. The application of big data and MEMS sensors in martial arts training prediction models exemplifies the fusion of diverse data sources to enhance performance and injury risk predictions [28].

In the realm of smart city services, a privacy-aware data fusion and prediction framework utilizing edge computing was developed, showing improved performance in terms of accuracy and computational efficiency [29,30,31]. Moreover, the use of federated learning in pedestrian trajectory prediction models has been explored, offering better data privacy security and prediction performance [32,33,34,35]. The development of medical sports data privacy protection methods based on legal risk control highlights the importance of standardizing data handling practices in the medical field [36]. An edge computing-based method for big data privacy protection in martial arts training movement trajectory prediction addresses both accuracy and security concerns [37]. Furthermore, an improved privacy protection algorithm for multimodal data fusion combines spatial and transform domain steganographic techniques, ensuring safe and reliable information fusion [38].

The incorporation of artificial intelligence into IoT data fusion and sensing detection has demonstrated potential in facilitating effective data integration and enhancing privacy protection [39,40,41]. A multimedia fusion privacy protection algorithm based on IoT data security under network regulations employs blockchain for a secure, reliable, and collusion-resistant scheme [42,43,44]. Additionally, an innovative method of distributed multi-sensor data fusion under privacy protection leverages federated learning for enhanced privacy [45,46,47]. The significance of legal frameworks and ethical guidelines in the context of sports data privacy protection is increasingly recognized. Research on medical sports data privacy protection underscores the need for a correct understanding of data’s role and the standardization of data handling by medical personnel [36].

Motivation

Imagine a scenario in Fig. 1 where an athlete, let’s call him John, participates in a series of events and training sessions. His performance data, along with physiological metrics, are continuously collected via wearable devices and IoT sensors deployed around the sports facilities. Additionally, John interacts with various other systems for nutrition, healthcare, and fan engagement, generating a diverse set of data spanning multiple domains.

Fig. 1
figure 1

A motivating example and challenges

To optimize John’s training regimen, enhance his performance, and personalize fan interactions, it is crucial to merge and analyze this multifaceted data. However, this raises significant privacy concerns. John’s health data, location information, and personal preferences could be exposed, potentially leading to privacy breaches.

In light of these challenges, this paper proposes a novel, privacy-aware approach for sports data fusion and prediction. Our method leverages an amplified LSH technique to protect the sensitive information inherent in sports data, allowing for the secure and efficient analysis of data from multiple sources. By doing so, we aim to improve athlete performance and fan engagement while upholding the highest standards of privacy protection. Please note that sport data often comprise varied data from multiple platforms or sources, which naturally results in different data formats. In this paper, we adopt the locality-sensitive hashing technique to convert the diverse data with different formats (e.g., real data, integer data, etc) into hashing indices without data formats (e.g., a six-dimensional index value 100110) uniformly. This way, we can overcome the challenges brought by different data formats in multi-source sport data integration.

A privacy-aware data fusion and prediction approach: PSDFPALSH

In this section, we introduce our novel approach, namely, PSDFP\(_{\text {ALSH}}\), tailored for the sports analytics domain. This method capitalizes on the enhanced Locality-Sensitive Hashing (LSH) algorithm to address privacy concerns inherent in sports data fusion and prediction. Actually, there are massive kinds of hashing techniques that can transform sensitive high-dimensional data into privacy-preserving low-dimensional indices, such as Minhash, Simhash, Hashcode, etc. However, we only adopt the LSH technique to secure the sensitive user data when performing multi-source data integration. The main reason is that LSH can guarantee (1) close data points are still close after hashing projection (2) distant data points are still distant after hashing projection. With such two advantages, we can leverage LSH techniques to transform sensitive high-dimensional data into privacy-preserving low-dimensional indices without sacrificing data utility. Our approach is segmented into three pivotal phases: transforming sports performance and fan interaction data into privacy-compliant indices, identifying approximate nearest neighbors based on these indices, and executing targeted predictions for athletes and fans.

Constructing privacy-compliant indices using enhanced LSH functions

Initially, we apply a series of enhanced hash functions to convert the sensitive sports data into privacy-compliant indices. Let \(p_{i,k}\) denote the performance metrics of athlete i in event k. For a athletes and b sporting events, we represent the athlete-event data matrix as P, detailed in Eq. (1). The hashing process, as illustrated in Eq. (2), anonymizes the raw data into a binary index.

$$\begin{aligned} P = \left[ \begin{array}{c} p_1 \\ p_2 \\ \vdots \\ p_a \end{array}\right] = \left[ \begin{array}{cccc} p_{11} &{} p_{12} &{} \ldots &{} p_{1b} \\ p_{21} &{} p_{22} &{} \ldots &{} p_{2b} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ p_{a1} &{} p_{a2} &{} \ldots &{} p_{ab} \end{array}\right] \end{aligned}$$
(1)
$$\begin{aligned} g(\vec {p}) = \left\{ \begin{array}{ll} 1, &{} \text {if } \vec {p} \cdot \vec {r} > 0 \\ 0, &{} \text {if } \vec {p} \cdot \vec {r} \le 0 \\ \end{array}\right. \end{aligned}$$
(2)

In the formula above, \(\vec {p} =(p_1,p_2,\ldots ,p_b)\) symbolizes the data vector for a specific athlete, encompassing performance metrics across events, and \(\vec {r} =(r_1,r_2,\ldots ,r_b)\) represents a randomly generated vector with each component assigned a value from -1 to 1. Through this hashing procedure, as depicted in Eq. (2), the computation of the hash functions, if the dot product of \(\vec {p}\) and \(\vec {r}\) is positive, then \(g(\vec {p})\) yields 1. Conversely, \(g(\vec {p})\) defaults to 0, effectively converting the original athlete data into a binary digit, thus preserving privacy.

Owing to the intrinsic variability and the necessity for precision, we engage M distinct hash functions \(g_{1}(), g_{2}(), \ldots , g_{M}()\) to compile a comprehensive binary sequence for each athlete, denoted as the privacy-compliant index \(I_{(p)}\) which encapsulates \((g_{1(p)}, g_{2(p)}, \ldots , g_{M(p)})\). This procedure ensures the transformation of sensitive sports data into a less intrusive index, safeguarding athletes’ and fans’ privacy while retaining data utility for predictive analysis within cloud platforms. The following algorithm outlines the process for constructing privacy-compliant indices for athletes and fans.

figure a

Algorithm 1 Constructing privacy-compliant indices for athletes/fans

Identifying athletes’ nearest neighbors

This section delineates the process to discern athletes’ nearest neighbors within the enhanced sports analytics framework, utilizing the PSDFP\(_{\text {ALSH}}\) methodology. This process involves several critical steps, leveraging the constructed privacy-compliant indices to ascertain similarity amongst athletes, thereby facilitating targeted predictions and strategic insights.

Following the construction of privacy-compliant indices, consider athlete \(a_1\) whose index is denoted by \(I_{(a_1)} = (g_{1(a_1)}, g_{2(a_1)}, \ldots , g_{M(a_1)})\) and similarly, \(I_{(a_2)}\) for athlete \(a_2\). To deduce similarity, an “AND” operation is executed across all corresponding bits of their indices. The requisite condition for similarity is formalized in Eq. (3), ensuring that for athletes to be considered similar, their respective indices must align across all M hash functions.

$$\begin{aligned} \forall i, \text {s.t. } g_i(a_1) = g_i(a_2), \text {where } i = 1, 2, ..., M \end{aligned}$$
(3)

The intrinsic probabilistic characteristics of LSH might lead to errors in assessing similarity (Step 1), which could impact the overall predictive accuracy. To address this, multiple sets (N) of hash functions are employed, with each set undergoing the “AND” operation (Step 1). The existence of any similar relationship across these N SM\(_1\) matrices signifies similarity between athletes, mitigating false negatives.

$$\begin{aligned} \exists j, \text {s.t. }(\forall i, \text {s.t. } g_{ji}(a_1) = g_{ji}(a_2), i = 1, 2, ..., M), j = 1, 2, ..., N \end{aligned}$$
(4)

This step entails generating Q sets of \(SM_2\) matrices by repeating the hash function grouping process (Step 2), each set incorporating the “OR” operation. A relaxed similarity criterion is applied; athletes are considered similar if at least one of the Q \(SM_2\) matrices indicates similarity, optimizing the balance between sensitivity and specificity of the prediction model.

To refine the similarity matrix and reduce false positives, the “AND” operation is applied across W \(SM_3\) matrices generated in Step 3. This rigorous criterion ensures that only consistent similarities across all matrices are recognized, culminating in the final similarity matrix, \(SM_4\), which serves as the foundation for precise data prediction and recommendation.

figure b

Algorithm 2 Identifying athletes’ nearest neighbors

Making data prediction and athlete/event recommendation

Upon establishing the final similarity matrix \(SM_4\) through “Constructing privacy-compliant indices using enhanced LSH functions and Identifying athletes’ nearest neighbors” sections , which encapsulates the similarity relationships among athletes, we proceed to predictive analyses and recommendations for a specified athlete, denoted as \(a_{\text {target}}\). The initial step involves identifying \(a_{\text {target}}\)’s nearest neighbors within \(SM_4\), collating these into a set named NN.

Subsequently, leveraging the preferences or performance metrics of similar athletes, predictions for \(a_{\text {target}}\) are formulated. The predictive function is encapsulated in Eq. (5), where |NN| signifies the count of similar athletes within NN and \(p_{a,k}\) denotes the performance metric of athlete a in event k. Predictions are made for each event, followed by a ranking process, culminating in the selection of an optimal event or performance improvement strategy for \(a_{\text {target}}\). Algorithm 3 outlines the process for making these data-driven recommendations.

We have to admit that LSH is a kind of probabilistic approximate neighbor search technique. As a consequence, we cannot guarantee an always success when using LSH for multi-source data integration and analyses. However, we have adopted some measures to reduce the failure rate from the following two aspects: (1) we use multiple hashing functions instead of one hashing function when creating indices, which can reduce the false-positive probability; (2) we use multiple hashing tables instead of one hashing table when recognizing close users, which can reduce the false-negative probability. This way, we can reduce the failure rate of our proposal as much as possible.

$$\begin{aligned} p_{\text {target},k} = \frac{1}{|NN|} \times \sum \limits _{a \in \text {NN}}p_{a,k} \end{aligned}$$
(5)
figure c

Algorithm 3 Athlete/event recommendation

Experiments

Experimental configurations

To validate the effectiveness of our proposed PSDFP\(_{\text {ALSH}}\) method within the sports analytics domain, a comprehensive experimental setup was designed. We employed the WSDream dataset, a real-world dataset comprising performance records involving a series of people-service invocation events. The performance evaluation included comparisons with two benchmark methods: the widely acknowledged CF method, known for its predictive accuracy but lacking privacy measures, and the SerRec\(_{\text {distri-LSH}}\), a privacy-aware approach utilizing LSH for data prediction. Our method’s parameters were meticulously selected based on preliminary trials: the number of hash functions (M) and hash groups (N) were tested with values set to 4, 6, 8, and 10, while Q and W, indicating the depth of similarity matrices, were similarly varied. This parameterization was designed to investigate the trade-off between privacy protection and predictive accuracy at various levels of data sparsity.

The experimental platform was configured with a 2.50 GHz processor and 16.0 GB memory, running on a Windows 11 OS, and utilized Python 3.6 for implementation. The analysis was conducted across varying data sparsity levels (0.1, 0.3, 0.5, 0.7, and 0.9) to assess the robustness and scalability of the proposed method against the changing density of available data.

Experiment results

Test 1: Accuracy Comparison.

In this analysis, we rigorously evaluate the accuracy of three different methodologies using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as our key metrics. MAE quantifies the average magnitude of errors between predicted values and actual outcomes, whereas RMSE measures the square root of the average squared differences, providing insight into the variance of prediction errors. Lower values of MAE and RMSE are indicative of higher predictive accuracy. To thoroughly examine the performance across various scenarios, we introduced data sparsity levels at 0.1, 0.3, 0.5, 0.7, and 0.9. This gradient allows for a comprehensive analysis under differing degrees of data availability.

In our findings, illustrated in Fig. 2, at a sparsity level of 0.1, the MAE for CF is notably the lowest, underscoring its precision. However, the MAE for our PSDFP\(_{\text {ALSH}}\) method, while marginally higher than that of CF, significantly surpasses SerRec\(_{\text {distri-LSH}}\) in accuracy, establishing a compelling case for its application in privacy-sensitive environments. A discernible trend is observed with increasing data sparsity: the accuracy for all methods diminishes, as indicated by rising MAE values. Nonetheless, our method consistently outperforms SerRec\(_{\text {distri-LSH}}\) across sparsity levels up to 0.9. Similar results of RMSE could be found in Fig. 3.

Despite the superior performance of CF in terms of MAE and RMSE, its direct utilization of user data without anonymization renders it unsuitable for scenarios demanding strict privacy adherence, such as in distributed edge-computing frameworks. Consequently, while PSDFP\(_{\text {ALSH}}\)’s accuracy is slightly lower in comparison, its robust privacy-preserving capabilities and competitive predictive performance, especially against SerRec\(_{\text {distri-LSH}}\), validate its effectiveness and feasibility for sports analytics applications. The challenges posed by high data sparsity (0.9 ratio) exacerbate the difficulty in achieving accurate predictions across all methods, attributed to the scant availability of pertinent user data for making informed predictions and recommendations.

Fig. 2
figure 2

Comparative MAE analysis of the three approaches under varying data sparsity conditions

Fig. 3
figure 3

Comparative RMSE analysis of the three approaches under varying data sparsity conditions

Test 2: Evaluating PSDFPALSH’s Accuracy with Variable Parameters

In alignment with the experimental setup detailed in “Experimental configurations” section, we meticulously varied the parameters M, N, Q, and W to investigate the impact on the accuracy of our PSDFP\(_{\text {ALSH}}\) methodology. Tables 1 and 2 present this analysis, with each column representing the product of M and N, indicating different combinations of hash functions and groups, while each row representing the product of Q and W. Table 1 depicts the MAE associated with three methods, while Table 2 illustrates the RMSE associated with three methods. The product \(Q*W\), signifies diverse scenarios based on the aggregation of similarity matrices.

Our observations from Tables 1 and 2 reveal that a larger N value yields notably superior accuracy for PSDFP\(_{\text {ALSH}}\), as indicated by lower MAE and RMSE values compared to other configurations. On the contrary, a smaller M value yields notably superior accuracy for PSDFP\(_{\text {ALSH}}\), as indicated by lower MAE and RMSE values compared to other configurations. It was further noted that holding M, N, and Q constant while increasing W led to a decrease in MAE and RMSE, indicating enhanced accuracy. Conversely, keeping M, N, and W fixed while augmenting Q resulted in a slight increase in these error metrics.

Table 1 MAE
Table 2 RMSE

Profile 3: Computational Efficiency of the Approaches

In this evaluation, we explored the computational efficiency of the PSDFP\(_{\text {ALSH}}\), SerRec\(_{\text {distri-LSH}}\), and CF methodologies. This analysis aimed to ascertain the time cost implications of each method under different data availability conditions. As illustrated in Fig. 4, CF consistently incurs higher computational time compared to the other two methods, which could be attributed to its direct use of original user data without any form of anonymization or compression, inherently leading to longer processing time.

Conversely, the time costs for PSDFP\(_{\text {ALSH}}\) and SerRec\(_{\text {distri-LSH}}\) are notably comparable and significantly lower than those for CF. The similar efficiency of the two LSH-based methods mainly stems from their use of hash functions to convert high-dimensional data into a lower-dimensional, privacy-preserving format. This transformation is usually done offline, which significantly reduces the time required for online operations.

The CF method’s reliance on Pearson similarity calculations directly on users’ original data not only raises concerns regarding privacy breaches but also results in increased computational demands, underscoring the superior efficiency and privacy compliance of the PSDFP\(_{\text {ALSH}}\) approach in distributed edge-computing environments dedicated to smart sports analytics.

Fig. 4
figure 4

Comparative efficiency analysis of the three approaches under varying data sparsity conditions

Conclusion

The integration and prediction of multi-source data within edge computing environments are pivotal for enabling intelligent sports analytics applications. However, the inherent tension between ensuring data privacy and maintaining data availability poses significant challenges, particularly in scenarios where preserving the integrity of athlete and event data is paramount. In this study, we introduced an approach leveraging the Locality-Sensitive Hashing technique to anonymize original data into a less sensitive index format. This transformation allows for the calculation of similarities using indices rather than raw data, thereby safeguarding athlete privacy without compromising on the utility of the data for predictive analytics.

Future research directions include the integration of additional privacy-preserving mechanisms alongside our method to further fortify data security. Moreover, we plan to incorporate more contextual variables, such as temporal and spatial factors, to enrich the prediction models. Recognizing that sports data analytics is profoundly influenced by a multitude of factors, integrating these additional dimensions will enable more nuanced and accurate predictions, thus advancing the field of smart sports analytics.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

  1. Dai H, Yu J, Li M, Wang W, Liu AX, Ma J, Qi L, Chen G (2023) Bloom filter with noisy coding framework for multi-set membership testing. IEEE Trans Knowl Data Eng 35(7):6710–6724

  2. Gu R, Zhang K, Xu Z, Che Y, Fan B, Hou H, Dai H, Yi L, Ding Y, Chen G, et al (2022) Fluid: dataset abstraction and elastic acceleration for cloud-native deep learning training jobs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), IEEE, pp 2182–2195

  3. Kong L, Li G, Rafique W, Shen S, He Q, Khosravi MR, Wang R, Qi L (2022) Time-aware missing healthcare data prediction based on arima model. IEEE/ACM Trans Comput Biol Bioinforma. https://doi.org/10.1109/TCBB.2022.3205064

  4. Dai H, Wang X, Lin X, Gu R, Shi S, Liu Y, Dou W, Chen G (2021) Placing wireless chargers with limited mobility. IEEE Trans Mob Comput 22(6):3589–3603

    Article  Google Scholar 

  5. Gu R, Chen Y, Liu S, Dai H, Chen G, Zhang K, Che Y, Huang Y (2021) Liquid: intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters. IEEE Trans Parallel Distrib Syst 33(11):2808–2820

    Google Scholar 

  6. Dai H, Xu Y, Chen G, Dou W, Tian C, Wu X, He T (2020) Rose: Robustly safe charging for wireless power transfer. IEEE Trans Mob Comput 21(6):2180–2197

    Article  Google Scholar 

  7. Wang F, Zhu H, Srivastava G, Li S, Khosravi MR, Qi L (2021) Robust collaborative filtering recommendation with user-item-trust records. IEEE Trans Comput Soc Syst 9(4):986–996

    Article  Google Scholar 

  8. Wang S, Chen X, Jannach D, Yao L (2023) Causal decision transformer for recommender systems via offline reinforcement learning. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York,  pp 1599–1608

  9. Qi L, Lin W, Zhang X, Dou W, Xu X, Chen J (2022) A correlation graph based approach for personalized and compatible web apis recommendation in mobile app development. IEEE Trans Knowl Data Eng 35(6):5444–5457

    Google Scholar 

  10. Adel A (2022) Future of industry 5.0 in society: human-centric solutions, challenges and prospective research areas. J Cloud Comput 11(1):40

  11. Qi L, Xu X, Wu X, Ni Q, Yuan Y, Zhang X (2023) Digital-twin-enabled 6g mobile network video streaming using mobile crowdsourcing. IEEE J Sel Areas Commun 41(10):3161–3174

  12. Fang J, Feng T, Guo X, Ma R, Lu Y (2024) Blockchain-cloud privacy-enhanced distributed industrial data trading based on verifiable credentials. J Cloud Comput 13(1):30

    Article  Google Scholar 

  13. Qi L, Liu Y, Zhang Y, Xu X, Bilal M, Song H (2022) Privacy-aware point-of-interest category recommendation in internet of things. IEEE Internet Things J 9(21):21398–21408

  14. Li Z, Xu X, Hang T, Xiang H, Cui Y, Qi L, Zhou X (2022) A knowledge-driven anomaly detection framework for social production system. IEEE Trans Comput Soc Syst. https://doi.org/10.1109/TCSS.2022.3217790

  15. Cao Y, Chen X, Yao L, Wang X, Zhang WE (2020) Adversarial attack and detection on reinforcement learning based recommendation system. In: The 43rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York

  16. Xu X, Li H, Li Z, Zhou X (2022) Safe: Synergic data filtering for federated learning in cloud-edge computing. IEEE Trans Ind Inform 19(2):1655–1665

    Article  Google Scholar 

  17. Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv (CSUR) 52(1):1–38

    Article  Google Scholar 

  18. Xu X, Gu J, Yan H, Liu W, Qi L, Zhou X (2022) Reputation-aware supplier assessment for blockchain-enabled supply chain in industry 4.0. IEEE Trans Ind Inform 19(4):5485–5494

  19. Gayathri S, Surendran D (2024) Unified ensemble federated learning with cloud computing for online anomaly detection in energy-efficient wireless sensor networks. J Cloud Comput 13(1):49

    Article  Google Scholar 

  20. Wang F, Wang L, Li G, Wang Y, Lv C, Qi L (2022) Edge-cloud-enabled matrix factorization for diversified apis recommendation in mashup creation. World Wide Web  25(5):1809–1829

  21. Kong L, Wang L, Gong W, Yan C, Duan Y, Qi L (2022) Lsh-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web 25(5):1793–1808

  22. Huang Y, Li YJ, Cai Z (2023) Security and privacy in metaverse: a comprehensive survey. Big Data Min Anal 6(2):234–247

    Article  Google Scholar 

  23. Feng J, Wei H (2023) A method of distributed multi-sensor data fusion under privacy protection. In: 2023 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS), IEEE, pp 188–191

  24. Zhao J, Rong C, Dang X, Sun H (2023) Qar data imputation using generative adversarial network with self-attention mechanism. Big Data Min Anal 7(1):12–28

    Article  Google Scholar 

  25. Feng L, Liu B, et al (2022) Low-energy data fusion privacy protection algorithm for three-dimensional wireless sensor network. Mob Inf Syst 2022:Article ID 3580607. https://doi.org/10.1155/2022/3580607

  26. Xu X, Tang S, Qi L, Zhou X, Dai F, Dou W (2023) Cnn partitioning and offloading for vehicular edge networks in web3. IEEE Commun Mag  61(8):36–42

  27. Zhang W (2019) A data fusion privacy protection strategy with low energy consumption based on time slot allocation and relay in wban. Peer Peer Netw Appl 12:1575–1584

    Article  Google Scholar 

  28. Li S, Liu C, Yuan G (2021) Martial arts training prediction model based on big data and mems sensors. Sci Program 2021:1–8

    Google Scholar 

  29. Qi L, Chi X, Zhou X, Liu Q, Dai F, Xu X, Zhang X (2022) Privacy-aware data fusion and prediction for smart city services in edge computing environment. In: 2022 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), IEEE, pp 9–16

  30. Yan R, Zheng Y, Yu N, Liang C (2023) Multi-smart meter data encryption scheme based on distributed differential privacy. Big Data Min Anal 7(1):131–141

    Article  Google Scholar 

  31. Zhang W, Xie Z, Sai AMVV, Zia Q, He Z, Yin G (2023) A local differential privacy trajectory protection method based on temporal and spatial restrictions for staying detection. Tsinghua Sci Technol 29(2):617–633

    Article  Google Scholar 

  32. Zhou X, Zheng X, Cui X, Shi J, Liang W, Yan Z, Yang LT, Shimizu S, Kevin I, Wang K (2023) Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks. IEEE J Sel Areas Commun 41(10):3191–3211

  33. Ni R, Lu Y, Yang B, Yang C, Liu X (2024) A federated pedestrian trajectory prediction model with data privacy protection. Complex Intell Syst 10:1787–1799

  34. Zhou X, Yang Q, Liu Q, Liang W, Wang K, Liu Z, Ma J, Jin Q (2024) Spatial-temporal federated transfer learning with multi-sensor data fusion for cooperative positioning. Inf Fusion 105:102182

    Article  Google Scholar 

  35. Zhou X, Liang W, Kawai A, Fueda K, She J, Kevin I, Wang K (2024b) Adaptive segmentation enhanced asynchronous federated learning for sustainable intelligent transportation systems. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2024.3362058

  36. Jia L, Fan W, et al (2021) Medical sports data privacy protection method based on legal risk control. J Healthc Eng 2021. https://doi.org/10.1155/2021/6630429

  37. Li X, Cui Z, Zhang F, Li L (2023) Application of big data privacy protection based on edge computing in the prediction of martial arts training movement trajectory. Int J Inf Technol Web Eng 18(1):1–14

    Google Scholar 

  38. Chen Z, Shuai J, Tian F, Li W, Zang S, Zhang X, et al (2022) An improved privacy protection algorithm for multimodal data fusion. Sci Program 2022:Article ID 4189148. https://doi.org/10.1155/2022/4189148

  39. Zhou X, Yang Q, Zheng X, Liang W, Kevin I, Wang K, Ma J, Pan Y, Jin Q (2024) Personalized federation learning with model-contrastive learning for multi-modal user modeling in human-centric metaverse. IEEE J Sel Areas Commun. 42(4):817–831

  40. Zhang X (2024) The privacy protection of iot data fusion and sensing detection under artificial intelligence technology. Wirel Pers Commun. https://doi.org/10.1007/s11277-023-10822-5

  41. Hazman C, Guezzaz A, Benkirane S, Azrour M (2024) Enhanced ids with deep learning for iot-based smart cities security. Tsinghua Sci Technol 29(4):929–947

    Article  Google Scholar 

  42. Zhu G, Li X, Zheng C, Wang L (2022) Multimedia fusion privacy protection algorithm based on iot data security under network regulations. Comput Intell Neurosci 2022:Article ID 3574812. https://doi.org/10.1155/2022/3574812

  43. Alzu’bi A, Alomar A, Alkhaza’leh S, Abuarqoub A, Hammoudeh M (2024) A review of privacy and security of edge computing in smart healthcare systems: issues, challenges, and research directions. Tsinghua Sci Technol 29(4):1152–1180

    Article  Google Scholar 

  44. El Moudene Y, Idrais J, El Abassi R, Sabour A (2023) Gender-based analysis of user reactions to facebook posts. Big Data Min Anal 7(1):75–86

    Article  Google Scholar 

  45. Shi L, Li K, Zhu H, et al (2023) Data fusion and processing technology of wireless sensor network for privacy protection. J Appl Math 2023:Article ID 1046050. https://doi.org/10.1155/2023/1046050

  46. Jalali NA, Chen H (2023) Federated learning security and privacy-preserving algorithm and experiments research under internet of things critical infrastructure. Tsinghua Sci Technol 29(2):400–414

    Article  Google Scholar 

  47. Zhou X, Huang W, Liang W, Yan Z, Ma J, Pan Y, Kevin I, Wang K (2024) Federated distillation and blockchain empowered secure knowledge sharing for internet of medical things. Inf Sci 662(120):217

    Google Scholar 

Download references

Acknowledgements

We would like to thank the provider of the dataset used in this research work.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

P. L: English writing and conceived idea, Developed research motivation and established the model. X. L: Conceived experimental ideas and designed methodologies, Wrote the initial draft and executed the experimental plan. B. Z: Reviewed and revised the initial draft, validated experimental results. G. D: Conducted a literature review for related work, providing comparative analysis, Shaped research ideas and enriched English writing.

Corresponding author

Correspondence to Xiang Li.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors agree on the publication of this paper if accepted.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, P., Li, X., Zang, B. et al. Privacy-preserving sports data fusion and prediction with smart devices in distributed environment. J Cloud Comp 13, 106 (2024). https://doi.org/10.1186/s13677-024-00671-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-024-00671-3

Keywords