 Research
 Open access
 Published:
Timeaware outlier detection in health physique monitoring in edgeaided sport education decisionmakings
Journal of Cloud Computing volume 13, Article number: 73 (2024)
Abstract
The increasing popularity of various intelligent sensor and mobile communication technologies has enabled quick health physique sensing, monitoring, collection and analyses of students, which significantly promoted the development of sport education. Through collecting the students’ physiological signals and transmitted them to edge servers, we can precisely analyze and judge whether a student is in poor health (e.g., an outlier). However, with time elapsing, the accumulated physiological signals of students become massive, which places a heavy burden on the quick storage and intime processing of physiological data of students. In this situation, it is becoming a necessity to develop a timeaware outlier detection technique for health physique evaluation of students in a timeefficient way. Considering this challenge, we propose a novel timeaware outlier detection method named TOD based on LocalitySensitive Hashing. TOD condenses extensive physiological student data into a concise set of health indices. Leveraging these indices, we can efficiently identify potential student outliers from a large pool of candidates with precision and speed. Finally, we have designed a group of simulated experiments based on WSDREAM dataset. Experiment results prove the feasibility and superiority of the TOD method compared with other existing methods.
Introduction
The dawn of the 21st century has seen a paradigm shift in the domain of sport education, largely fueled by the advent and integration of intelligent sensor technologies and advanced mobile communication systems [1,2,3]. These technological advancements have opened new vistas in the realm of health physique monitoring, particularly in the educational context, where the physical wellbeing of students is of paramount importance. In recent years, the proliferation of smart, sensorbased technologies has revolutionized the way we perceive and interact with our immediate environment [4, 5]. In the field of sport education, these technologies have been instrumental in facilitating rapid and accurate sensing, monitoring, and analysis of students’ health parameters. This enhanced capability has not only contributed to a more nuanced understanding of student health dynamics but has also paved the way for more personalized and effective physical education strategies.
However, this influx of data presents a significant challenge. As time progresses, the accumulated physiological data from students burgeons to an enormous volume. This exponential growth in data volume places a substantial strain on storage capacities and the timely processing of this information [6,7,8]. In such a scenario, the need for a sophisticated, timeaware outlier detection mechanism becomes evident. This requirement is not just for the sake of data management efficiency but also for ensuring that the health monitoring process remains robust and effective, particularly in identifying students whose health parameters deviate from the norm, termed as ‘outliers’.
Addressing this need, our research introduces a novel technique named Timeaware Outlier Detection (TOD). TOD represents a significant leap in the realm of health physique monitoring in sport education. By employing localitysensitive hashing, TOD transforms the voluminous physiological data of students into a concise set of health indices. These indices, despite their compact nature, encapsulatively represent the health status of each student. This transformation is crucial in swiftly pinpointing potential outliers among a large pool of students, thereby ensuring that the detection process is not just accurate but also timeefficient.
The practical efficacy and superiority of TOD over existing methodologies are substantiated through a series of simulated experiments. These experiments, structured around the WSDREAM dataset, offer a comprehensive evaluation of TOD’s performance. The results from these experiments unequivocally demonstrate the feasibility of TOD and its superiority in comparison to other existing outlier detection methods. The significance of TOD lies in its ability to seamlessly integrate with edge computing environments. Edge computing, characterized by its proximity to data sources, enables quicker data processing and decisionmaking at the edge of the network, closer to where data is generated [9, 10]. This is particularly beneficial in the context of health physique monitoring in sport education, where realtime data analysis and immediate response are crucial. Through the utilization of edge computing, TOD maximizes the processing speed and efficiency of the abundant physiological data produced by students, resulting in prompt and precise health evaluations.
Furthermore, TOD’s approach to outlier detection is not merely a technical advancement; it has profound implications for the holistic development of students in the realm of sport education. By enabling timely and accurate identification of students who may be at health risks, educators and health professionals can intervene proactively, ensuring that each student receives the attention and care they require. This not only fosters a safer and more supportive learning environment but also contributes to the overall wellbeing and physical development of students.
In summary, the contributions of the paper are briefly introduced as follows.

(1)
We propose a novel timeaware efficient outlier detection method TOD based on LocalitySensitive Hashing. TOD can generate a small number of student health indices based on the massive health data of students distributed in multiple edge servers. And with the few student health indices, we can precisely detect the possible student outliers among massive candidates in a timeefficient manner.

(2)
In order to demonstrate the effectiveness of the TOD approach in outlier detection, we conducted a series of simulated experiments using the WSDREAM dataset, which reflects realworld scenarios. By analyzing and contrasting the experimental outcomes with those of alternative methods, we substantiate the efficacy and superiority of TOD in identifying potential outliers in student health assessments.
The reminder of this article is organized as follows. Related literatures are investigated and introduced in “Related work” section. A motivating example is presented in “Motivation” section to emphasize the research significance of this article. The proposed TOD method is clarified in detail in “Our proposal: TOD” section. Evaluation is provided in “Experiments” section. At last, we conclude the paper and discuss the future directions in “Conclusion and future work” section.
Related work
Edge computing in health monitoring
Work [11] presents a novel approach to task scheduling in edge computing environments, specifically tailored for health monitoring systems in smart cities. The authors utilize SoftwareDefined Networking (SDN) to create a more secure and efficient task scheduling mechanism. The proposed solution aims to optimize the allocation of computing resources, reduce latency, and ensure secure data transmission, which are critical for effective health monitoring operations in urban environments. In [12], the authors address the need for advanced and secure health monitoring systems by proposing an Optimized Deep Recurrent Neural Network (ODRNN) model integrated with edge computing. The model is designed to enhance the realtime collection and analysis of health data, ensuring faster and more personalized healthcare services. In [13], an innovative health monitoring system is introduced, which integrates IoT technologies with edge computing. At its heart lies a Convolutional Neural Network (CNN) model specifically tuned for cardiac classification. It operates within an IoT framework, where data from various health sensors are processed locally at edge servers, significantly reducing latency and energy consumption. In [14], the authors focus on developing a comprehensive patient health monitoring system utilizing the synergies of fog and edge computing. The system integrates a variety of sensors, including temperature, pulse, tilt, and flex sensors, all connected to a Raspberry Pi Pico W. This setup is particularly designed to assist in monitoring bedridden patients, enabling the tracking of vital signs and movements.
Intelligent sensor technologies in sport education
The paper [15] explores the application of big data analysis in personalized education management systems, using intelligent sensor networks. It provides a theoretical and empirical analysis of individualized teaching methods, emphasizing the role of big data in understanding students’ habits and optimizing educational programs. The authors use data mining techniques, like the FPGrowth method, to analyze and optimize training schemes, demonstrating the dynamic and predictive capabilities of big data in education. In [16], the authors introduce the Lowcost Efficient Wireless Intelligent Sensor (LEWIS), a platform aimed at enhancing engineering, research, and education. It discusses the hardware and software architecture of LEWIS1, including a simplified version (LEWIS1 beta) that enhances user interfaces and simplifies both hardware and software. Literature [17] discusses the integration of artificial intelligence and sensor technologies in educational environments, particularly for students with developmental or intellectual disabilities. It highlights the revolution in computational power, brain mapping, wearable sensors, and AI, which enable realtime assessment of students’ cognitive states and learning progress. The paper emphasizes the potential of these technologies to adapt teaching methods to individual needs, offering a multimodal and multidimensional approach to education. In [18], the authors discuss the use of LEWIS in monitoring natural hazards like flooding and wildfires, emphasizing its costeffectiveness and ease of deployment. It also covers the training and education components of LEWIS, showcasing its potential to engage communities, from students to industry professionals, in creating innovative monitoring solutions and combating climate change.
Timeaware outlier detection
In [19], the authors address the intersection of fairness and outlier detection (OD), a relatively unexplored area in fair machine learning. The paper develops FairOD, a fairnessaware outlier detector that aims to achieve treatment parity, flag equal proportions of samples from all groups (group fairness), and identify truly highrisk samples within each group. The study is significant in balancing fairness with detection performance, showing that FairOD can achieve fair outcomes while maintaining or even improving detection accuracy compared to fairnessagnostic detectors. Literature [20] introduces TADILOF, a novel algorithm for outlier detection in data streams, particularly addressing the challenges posed by the Internet of Things (IoT). TADILOF is a timeaware, densitybased incremental local outlier detection method that adapts to changes in data over time, such as the emergence of new data clusters. It also features an "approximate LOF" for estimating the local outlier factor based on historical data. In [21], the authors propose a new framework for contextual outlier detection in big data streams. The framework integrates contextual attributes with the content of data streams for outlier detection, addressing both content and contextual anomalies. It includes both supervised and unsupervised detection methods, allowing the system to adapt to normal changes in stream behavior over time. The work [22] introduces AUTO, a novel approach for outofdistribution (OOD) detection during test time using unlabeled online data. AUTO addresses challenges like catastrophic forgetting by incorporating an inoutaware filter, an ID memory bank, and a semanticallyconsistent objective. It adaptively mines pseudoID and pseudoOOD samples from test data to optimize networks in realtime during inference. The authors in [23] propose a method for temporal outlier detection and analysis in business process executions. The approach encompasses modeling temporal behaviors, taking into account various controlflow patterns within business processes, and generating an execution time matrix based on event logs. This matrix serves as the foundation for identifying temporal outliers and conducting correlationbased analyses. It holds particular significance for processaware information systems, offering valuable insights into the temporal intricacies of business processes.
Outlier detection techniques in health and sport education
In [24], the authors focus on outlier detection in educational data, specifically in inquirybased learning results of students. It uses the KMeans Clustering method with MinkowskiChebyshev distances to detect outliers. The comparison results with related methods show the advantage of their proposal. The authors of [25] present a novel method for geocoding historical addresses, which is crucial for spatial data analysis in social science research, including studies on education, health, and emigration. The method uses an online geocoding service and employs outlier detection to improve the accuracy of locations assigned to addresses. Literature [26] uses Electromyography (EMG) signals to evaluate muscle activations in different genders during prolonged sitting tasks and investigates the influence of various types of exercise on muscle activation. The study aims to propose the best exercise to prevent low back discomfort, using EMG signals to record muscle activity and comparing Root Mean Square (RMS) values for muscle activation during prolonged sitting and after exercises. The study in [27] investigates the association between sedentary behavior, physical activity, and depression among sports university students. A crosssectional survey was conducted in [27] to gather data on students’ sedentary behavior, physical activity, and depression levels. The study found a significant association between recreational screen time, sedentary time spent on schoolwork, and participation in vigorous physical activity with depression. The findings highlight the importance of balancing screen time and physical activity to mitigate depression risks in sports university students.
Based on the literature analyses conducted above, it is evident that current literature lacks effective methods for timely outlier detection in monitoring physical health within sports education. Recognizing this gap, we introduce a pioneering approach called TOD, which utilizes LocalitySensitive Hashing to achieve timeefficient outlier detection.
Motivation
Figure 1 presents an intuitive example to demonstrate the research motivation of this article. In the example, the health physique conditions of students are monitored and collected by smart devices such as smart watches, which form timeaware health signals. Later, health signal data are transmitted to nearby edge servers and then to a central cloud platform for uniform data fusion, analyses and processing. In general, the above data processing framework can accurately recognize the students’ health conditions (e.g., outlier detection) through data mining; however, the accumulated health data of students are becoming more and more with time elapsing, which make it hard to detect potential outliers of students in a timeefficient manner. In this situation, we need to develop more lightweight outlier detection method with low time complexity to cope with the big data situation. Inspired by this observation, we propose a novel timeaware outlier detection method with low time complexity for better health physique evaluation of students in edge computing.
Our proposal: TOD
In “LocalitySensitive Hashing” section, we’ll provide a succinct overview of LocalitySensitive Hashing, the primary technique employed in our paper. Subsequently, guided by LocalitySensitive Hashing, we present TOD, a novel outlier detection approach adept at managing timesensitive health monitoring and evaluation in edge environments with effectiveness and efficiency.
LocalitySensitive Hashing
LocalitySensitive Hashing is a technique designed for reducing the dimensions of large, highdimensional datasets while maintaining the similarity relationships between data points. The fundamental concept of LocalitySensitive Hashing involves hashing input items so that similar items are likely to be grouped into the same "buckets" due to the localitysensitive nature of the hash functions. This approach facilitates approximate nearest neighbor searches in sublinear time, offering a much faster alternative to linear search in extensive datasets.
The essence of LocalitySensitive Hashing lies in employing a set of hash functions, collectively known as a family H, which ensures that similar items x and y have a high probability of ending up in the same hash bucket. This family H is termed "localitysensitive" and adheres to two key principles for any pair of items x and y:

(1)
High Similarity Collision Probability: If x and y are close or similar, the probabilityPr[H(x) = H(y)] that they hash to the same value is high.

(2)
Low Dissimilarity Collision Probability: Conversely, if x and y are distant or dissimilar, the probability \(Pr[H(x) = H(y)]\)that they hash to the same value is low.
To further define this concept, consider:

d(x, y) as the measure of distance between two items x and y.

r as a specified threshold distance.

\(p_1\) as the probability that two points hash to the same value when their distance is less than r.

\(p_2\) as the probability that two points hash to the same value when their distance is greater than r.
A family H is called (r, p1, p2)sensitive if for any x,y:

If \(d(x, y) \le r\), then \(Pr [H(x) = H(y)] \ge p1\)

If \(d(x, y) > r\), then \(Pr [H(x) = H(y)] \le p2\)
Through adjusting the values of parameters r, \(p_1\) and \(p_2\), we can approximately formulate the similarityguarantee feature of LocalitySensitive Hashing. More concretely, a data point can be converted to a hash index through LocalitySensitive Hashing. With such hash indexes of data points, we can perform privacypreserving approximate neighbor search: two neighboring data points are of the same or close index with high probability; two nonneighboring data points are of the different or distinguished index with high probability.
In essence, LocalitySensitive Hashing provides an efficient way to group and search data based on similarity, which is particularly useful in handling largescale, highdimensional data in various applications including outlier detection. LocalitySensitive Hashing can help to create an offline index table which can support timeefficient outlier detection. More concretely, the time complexity of our proposed TOD method based on LocalitySensitive Hashing is approaching O(1), which is more suitable for the outlier detection applications with big data context.
TOD: timeaware outlier detection based on LocalitySensitive Hashing
The major idea of TOD method is: according to LocalitySensitive Hashing, we first classify the students into different clusters based on their monitored health conditions; afterwards, we judge whether a student is a health outlier by considering the number of students who belong to an identical cluster.
Next, we introduce the details of TOD method. TOD starts from analyzing the studenttime health data which are often organized and stored within a matrix after the data are collected by various wearable sensors. Here, we use STM to denote the above matrix and the matrix can be represented in Eq. (1). In STM, each row is corresponding to a student’s health physique data collected at n different time periods \(t_1, . . ., t_n\); each column is corresponding to the health physique data of m students \((s_1, . . ., s_m)\) collected at a certain time period; the concrete value of the element \(hp_{x,y}\) means the health physique data of \(s_x\) at time period \(t_y\).
In STM, each student probably has a considerable amount of health physique monitoring records with time elapsing. In other words, if the health monitoring period is very long, the matrix STM will become extremely large. In this situation, it will be timeconsuming to process and analyze the data in STM and as a consequence, detecting the outliers in STM will consume too much time and is not suitable for the big data context. To tackle this issue, we need to convert the large STM matrix into a much smaller one during which key valuable information contained in STM is still maintained and not lost. In our article, we use LocalitySensitive Hashing introduced in “LocalitySensitive Hashing” section to achieve above goal. LocalitySensitive Hashing has been proven an effective tool to achieve highquality data compression in big data context and is very useful and applicable to develop lightweight information retrieval and object clustering.
The basic concept of implementing data compression through LocalitySensitive Hashing involves transforming the original studenttime health matrix (STM) into a comparable student health index matrix. Subsequently, we will outline the methodology for achieving this conversion. We take student \(s_1\) as an example to demonstrate the concrete conversion process. In STM matrix, \(s_1\) is corresponding to an ndimensional vector denoted by \(v(s_1)\). Correspondingly, we need to create a new vector (denoted by \(v_{new}\)) which also contains n dimensions as \(v(s_1)\) does. In concrete, we follow the rule in Eqs. (2) and (3) to create \(v_{new}\). Here, function f(x, y) is responsible to generate a random number which is larger than 1 and smaller than 1.
With the created new vector \(v_{new}\), we can map the ndimensional vector \(v(s_1)\) to be \(q_1\) (0 or 1) by the conversion formulas in Eqs. (4) and (5). The physical meaning of the above mapping process is as follows: let us consider \(v(s_1)\) to be a data point in an ndimensional space, and the new vector \(v_{new}\) to be a plane which splits the ndimensional space into two parts; then Eqs. (4) and (5) are responsible to judge whether the data point corresponding to \(v(s_1)\) is above the plane represented by \(v_{new}\) or not: if the data point is above the plane, then \(q_1\) = 1; otherwise, \(q_1\) = 0.
Since the new vector \(v_{new}\) in Eqs. (2) and (3) is created based on a random function f(x, y), a high randomness is inevitable during the judgment process of pointplane relative position formulated in Eqs. (4) and (5). To minimize such a randomness, we do not execute the above process (i.e., Eqs. (2) and (5)) only once for vector \(v(s_1)\); instead, we repeat the above process \(k (k\ll n)\) times and then obtain k values of \(q_1\), denoted by \(q_{1,1}\), . . ., \(q_{1,k}\) for vector \(v(s_1)\). We use \(Q_1\) in Eq. (6) to represent the abovementioned kdimensional 01 vector. Thus, the ndimensional vector \(v(s_1)\) is changed to be a kdimensional vector \(Q_1\). Since \(k\ll n\), \(v(s_1)\) is a realvalue vector while \(Q_1\) is a Booleanvalue vector, \(Q_1\) is regarded as a promising index of original \(v(s_1)\) corresponding to the student s1. Likewise, for each student \(s_1\), . . ., \(s_m\), we can also obtain his or her index denoted by \(Q_1\), . . ., \(Q_m\), respectively.
With the index of each student, i.e., \(Q_1\), . . ., \(Q_m\), we can classify all the students \(s_1\), . . ., \(s_m\) into different clusters (here, we assume there are totally \(d (d \le m)\) clusters (i.e.,\(Cluster_1\), . . .,\(Cluster_d\)) are obtained). The classification rationale or basis is the characteristics of LocalitySensitive Hashing introduced in “LocalitySensitive Hashing” section, i.e., the nature formulated by the following conclusion:
A family H is called \((r, p_1, p_2)\)sensitive if for any x, y:

If \(d(x, y) \le r\), then \(Pr [H(x) = H(y)] \ge p1\)

If \(d(x, y) > r\), then \(Pr [H(x) = H(y)] \le p2\)
More concretely, if two students own the same index, then their distance is small and could be classified into an identical community. This way, we can classify all the students into corresponding communities. The concrete classification process is demonstrated by the example in Fig. 2. In the example, four students belong to \(Cluster_1\), five students belong to \(Cluster_2\), . . ., one student belongs to \(Cluster_d\). Therefore, in this situation, the last student \(s_m\) could be regarded as an outlier since his or her corresponding community contains the fewest students. This way, we use the lightweight student health indices, instead of the heavyweight studenttime health matrix STM in Eq. (1), to achieve the final goal of outlier detection in student health evaluation applications.
However, the above student clustering process also faces the challenge of misjudgment due to the randomness nature of LocalitySensitive Hashing. Inspired by this limitation, we repeat the student health index creation process K times instead of only once and then we have obtained an improved version of Fig. 2, i.e., Fig. 3 which provides the StudentIndexCluster correspondence relationships within K tables. The difference between Figs. 2 and 3 is not substantial as Fig. 3 is only updated from Fig. 2 by extending from one table to K tables. However, we argue that such an update is necessary and important for guaranteeing the accuracy of outlier detection in TOD method. This is because the LocalitySensitive Hashing technique adopted in TOD method is essentially a probabilitybased approximate neighbor search technique and can lead to falsepositive or falsenegative outlier detection results inevitably. In this situation, we have to minimize the falsepositive or falsenegative probability through various optimization operations. The above update from Figs. 2 to 3 is for such an optimization since more hash tables can guarantee to reduce the falsepositive or falsenegative probability of LocalitySensitive Hashing, which has already been proven by existing literature.
With these K tables, we can minimize the randomness of our proposed TOD method in finding potential outliers existed in the health physique monitoring data of students. In concrete, in zth table \(T_z\) \((1 \le z \le K)\), we store \(d_z\) communities that we classify all the m students into. Please note that in different repetitions, each student is probably classified into different clusters. Then for each cluster, we count the total students who belong to the cluster in any of the K tables. Concrete process is formalized in Eqs. (7) and (8). Here, D in Eq. (7) denotes the number of unrepeated clusters after K repetitions; Sizeof(\(Cluster_\theta\)) \(\lambda\) is used to count the total students who belong to \(\theta\)th cluster in \(\lambda\)th table.
Next, the students belong to the \(\theta\)th cluster \(Cluster_\theta\) are regarded as possible outliers since \(Cluster_\theta\) own the fewest students compared to other clusters. This way, through the above TOD method (formalized with Eqs. (1) and (9)), we can successfully achieve our goal of outlier detection in finding abnormal students whose health physique conditions are different with other students. The major advantage of TOD method is that it can efficiently process the timeaware health physique monitoring data of students since it uses lightweight index mechanism for student clustering and outlier detection.
Regarding the time granularity adopted in this paper, we did not discuss more details in TOD since TOD can be applied to most timeaware scenarios as long as an identical time granularity is adopted in the scenario. In other words, no matter data are produced with a time granularity of second or minute or hour or day or week or month or year, TOD can always work well in detecting possible outliers with time stamps.
Experiments
Experiment setup
In this section, our proposed TOD method is compared with another four methods which are introduced briefly as follows, respectively. Experiments are based on WSDREAM dataset (https://wsdream.github.io/) which is a timeaware service quality monitoring dataset. Evaluation metrics include MAE which indicates outlier detection accuracy and time cost which reflects outlier detection efficiency. Experiments are running under Win 11 operation system and python 3.10.0. Hardware settings include Intel(R) Core(TM) i51235U 2.50 GHz and 16.0 GB RAM. Each set of experiments are repeated 100 times to register their average performances.

(1)
SICF [28]: Similar Itemsbased Collaborative Filtering.

(2)
SUCF [29]: Similar Usersbased Collaborative Filtering.

(3)
TLCF [30]: Timeaware and Locationaware Collaborative Filtering.

(4)
WSWalker [31]: Random walk with Locationaware Collaborative Filtering.
Performance comparison
Detection accuracy comparison
The accuracy of different methods is measured by MAE which belongs to “smaller is better” rule. The parameter values of TOD method in this profile are as follows: the size of student set, i.e., m is varied from 100 to 500; the size of time period set, i.e., n is varied from 1000 to 5000; size of function set in Eq. (6), i.e., k = 10; size of table set in Fig. 3, i.e., K = 5. MAE comparison results are reported in Fig. 4 which contains two figures: Fig. 4(a) shows the MAE values of different methods with respect to m (here, n = 5000); Fig. 4(b) shows the MAE values of different methods with respect to n (here, m = 500). As Fig. 4 reflects, the MAE values of five methods all vary with the increment of parameters m and n which means that these methods are all datadriven and datarelated. Furthermore, compared with the four methods (i.e., SICF, SUCF, TLCF and WSWalker), our proposed TOD method performs the best in terms of accuracy. The advantage of TOD’s accuracy is attributed to the fact that LocalitySensitive Hashing used in TOD is an effective similar object searching technique and therefore, can guarantee a more accurate student clustering result as well as a more accurate outlier detection result.
Detection efficiency comparison
As we introduced in the paper contribution part, one major contribution of our TOD method is its low time cost in tackling big data issues. To validate the efficiency advantage of TOD, we have devised a set of experiments and compare TOD’s time cost with other baselines. The parameter setting is the same as that in the above profile, which is not repeated here. In concrete, the time cost comparison results of five methods are reported in Fig. 5 which also contains two figures: Fig. 5(a) shows the computational time of different methods with respect to m (here, n = 5000); Fig. 5(b) shows the computational time of different methods with respect to n (here, m = 500). As observed from Fig. 5, the time costs of five methods all grow with the rising of parameters m and n; this is because a larger m or n often means more student health data need to be processed, analyzed and mined in order to find potential outliers; correspondingly, more processing time is needed in every method. Moreover, our TOD method outperforms the other four baselines (i.e., SICF, SUCF, TLCF and WSWalker) significantly. This is due to the fact that the student indices based on LocalitySensitive Hashing can be created and stored beforehand; while outlier detection operation only needs to read the stored indices and then cluster students based on student indices, whose time cost is rather small [32,33,34]. Hence, we can conclude that TOD method is suitable for timeaware outlier detection scenarios with big data.
Ablation study
Next, we observe the performances of TOD method with respect to two key parameters: size of function set in Eq. (6), i.e., k and size of table set in Fig. 3, i.e., K. Here, k is varied from 2 to 10 with a stepsize of 2, K is varied from 5 to 25 with a stepsize of 5. Other parameters are set as follows: m = 500, n = 5000. Concrete experiment results are shown in Fig. 6 which also includes two figures: Fig. 6(a) depicts the MAE variation trend of TOD with respect to k and K, while Fig. 6(b) reports the time cost variation trend of TOD with respect to k and K.
In Fig. 6(a), we can see an approximate decline of TOD’s MAE with the increment of k; this observation may be caused by the relationship between the clustering condition and parameter k. More concretely, as Eq. (6) shows, each student index is corresponding to a kdimensional vector. And theoretically, a larger k value often leads to a longer index vector as well as the resulted stricter evaluation conditions in student clustering and outlier detection. In this situation, only the really similar students with close health conditions will be put into an identical cluster; therefore, the accuracy of clustering and outlier detection is enhanced accordingly. Another observation from Fig. 6 is that TOD’s MAE approximately increases with the growth of K; this observation may be caused by the relationship between the clustering condition and parameter K. More concretely, as Fig. 3 shows, each student index is corresponding to K tables containing student health indices. And theoretically, a larger K value often leads to a looser evaluation condition in student clustering and outlier detection. In this scenario, students with dissimilar health conditions may be grouped into the same cluster, leading to a reduction in the accuracy of both clustering and outlier detection.
Low latency is a common requirement in dataintensive applications with network communications [35,36,37,38,39]. As we analyzed before, TOD’s performances including computational cost is highly related to parameters of k and K. Their relationships are demonstrated in Fig. 6(b), where the time cost of TOD approximately increase with the growth of both k and K. This finding can be explained as follows: as Eqs. (6)(8) and Fig. 3 show, a larger k or K often means more clusters according to student health indices while each cluster needs to count the students who belong to the cluster. As a result, when k or K grows, the number of clusters as well as the statistics burden are increased accordingly. Therefore, the time cost of TOD rises with the growth of k or K. However, as Fig. 6(b) shows, the time cost of TOD is generally small. This low time complexity indicates that our proposal can be applied to general outlier detection scenarios even in the big data context.
Conclusion and future work
The research presented in this paper successfully addresses the challenge of efficiently processing and analyzing massive physiological data in the context of sport education. Our proposed timeaware outlier detection method, TOD, leverages the principles of LocalitySensitive Hashing to effectively condense extensive physiological data into a manageable set of student health indices. This approach not only eases the burden of data storage and processing but also enables the precise and rapid identification of potential health outliers among students. The simulated experiments conducted using the WSDREAM dataset have demonstrated the practicality and superiority of TOD over existing methods. These findings underscore the potential of TOD in enhancing health monitoring and evaluation in educational settings, paving the way for more responsive and targeted interventions for student health and wellbeing.
However, there are still several shortcomings or limitations in TOD. First of all, TOD can only process the timeaware user health data, without incorporating more complex health data with other valuable context information such as user location. In addition, LocalitySensitive Hashing adopted in TOD is essentially a probabilitybased approximate neighbor search technique and can lead to falsepositive or falsenegative outlier detection results inevitably. In the future work, we will further improve our proposed TOD method for outlier detection by including more context factors such as the dynamic movement trajectory information of people for better outlier detection performances. In addition, LocalitySensitive Hashing used in TOD method is still a probabilitybased fast data search technique; therefore, how to minimize the falsepositive or falsenegative probability existed in outlier detection is another research direction that calls for intensive study.
Availability of data and materials
No datasets were generated or analysed during the current study.
Abbreviations
 TOD:

Timeaware Outlier Detection
 ODRNN:

Optimized Deep Recurrent Neural Network
 CNN:

Convolutional Neural Network
 LEWIS:

Lowcost Efficient Wireless Intelligent Sensor
 SDN:

SoftwareDefined Networking
 OOD:

OutOfDistribution;
 STM:

StudentTime health Matrix
 SICF:

Similar Itemsbased Collaborative Filtering
 SUCF:

Similar Usersbased Collaborative Filtering
 TLCF:

Timeaware and Locationaware Collaborative Filtering
 WSWalker:

Random walk with Locationaware Collaborative Filtering
References
Diao G, Liu F, Zuo Z, Moghimi MK (2022) Privacyaware and efficient student clustering for sport training with hash in cloud environment. J Cloud Comput 11(1):1–11
Zhou X, Ye X, Kevin I, Wang K, Liang W, Nair NKC, Shimizu S, Yan Z, Jin Q (2023) Hierarchical federated learning with social context clusteringbased participant selection for internet of medical things applications. IEEE Trans Comput Soc Syst 10(4):1742–1751
Wu X, Zhou J, Zheng M, Chen S, Wang D, Anajemba J, Zhang G, Abdelhaq M, Alsaqour R, Uddin M (2022) Cloudbased deep learningassisted system for diagnosis of sports injuries. J Cloud Comput 11(1):1–18
Liu Y, Zhou X, Kou H, Zhao Y, Xu X, Zhang X, Qi L (2023) Privacypreserving pointofinterest recommendation based on simplified graph convolutional network for geological traveling. ACM Trans Intell Syst Technol https://doi.org/10.1145/3620677
Qi L, Liu Y, Zhang Y, Xu X, Bilal M, Song H (2022) Privacyaware pointofinterest category recommendation in internet of things. IEEE Internet Things J 9(21):21398–21408
Xu X, Li H, Li Z, Zhou X (2022) Safe: synergic data filtering for federated learning in cloudedge computing. IEEE Trans Ind Inform 19(2):1655–1665
Qi L, Xu X, Wu X, Ni Q, Yuan Y, Zhang X (2023) Digitaltwinenabled 6g mobile network video streaming using mobile crowdsourcing. IEEE J Sel Areas Commun 41(10):3161–3174
Wang F, Zhu H, Srivastava G, Li S, Khosravi MR, Qi L (2021) Robust collaborative filtering recommendation with useritemtrust records. IEEE Trans Comput Soc Syst 9(4):986–996
Xu X, Tang S, Qi L, Zhou X, Dai F, Dou W (2023) Cnn partitioning and offloading for vehicular edge networks in web3. IEEE Commun Mag 61(8):36–42
Zhou X, Yang Q, Zheng X, Liang W, Kevin I, Wang K, Ma J, Pan Y, Jin Q (2024) Personalized federation learning with modelcontrastive learning for multimodal user modeling in humancentric metaverse. IEEE J Sel Areas Commun 42(4):817–831
Zhang S, Tang Y, Wang D, Karia N, Wang C (2023) Secured sdn based task scheduling in edge computing for smart city health monitoring operation management system. J Grid Comput 21(4):1–14
Pavithra D, Nidhya R, Shanthi S, Priya P (2023) A secured and optimized deep recurrent neural network (drnn) scheme for remote health monitoring system with edge computing. Automatika 64(3):508–517
Stephen DK, Dr. VM, Manalang AR, et al (2023) IOTbased generic health monitoring with cardiac classification Using edge computing[J]. J Internet Serv Inf Secur 13(2):128–145
Gowrishankar V, Jayakumar T, Parameswaran S, Senthilkumar M, Lekashri S, Kumar R (2023) Patient health monitoring using fog and edge computing. In: 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), IEEE, pp 250–256
Deng J, He J, Duan X, et al (2022) Application research of advanced intelligent big data analysis based on intelligent sensor network in the design of personalized education management system and the construction of innovation system[J]. J Sens 2022:7113098
Sanei M, Atcitty S, Moreu F (2023) Lowcost efficient wireless intelligent sensor (lewis) for engineering, research, and education. arXiv preprint arXiv:2303.13688
Lamb R, Choi I, Owens T (2023) Artificial intelligence and sensor technologies the future of individualized and differentiated education[J]. Int J Psychol Neurosci 9(1):30–36
Alampalli S, Malek K, Mohammadkhorasani A, Moreu F (2023) Lowcost efficient wireless intelligent sensor (lewis) deployment for community driven decision making. Struct Health Monit
Shekhar S, Shah N, Akoglu L Fairod: Fairnessaware outlier detection[C]//Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 2021:210–220
Huang JW, Zhong MX, Jaysawal BP (2020) Tadilof: Time aware densitybased incremental local outlier detection in data streams. Sensors 20(20):5829
Ahmad H, Dowaji S (2018) A novel framework for contextaware outlier detection in big data streams. J Digit Inf Manag 16(5):213–222
Yang P, Liang J, Cao J, He R (2023) Auto: Adaptive outlier optimization for online testtime ood detection. arXiv preprint arXiv:2303.12267
Park CG, Ahn H (2019) Temporal outlier detection and correlation analysis of business process executions. IEICE Trans Inf Syst 102(7):1412–1416
Wahyuni E, Surono S, Eliyanto J (2021) Outlier detection using kmeans clustering with minkowskichebyshev distances for inquirybased learning results in students dataset. In: 2021 International Conference on Artificial Intelligence and Big Data Analytics, IEEE, pp 1–5
Kirielle N, Christen P, Ranbaduge T (2019) Outlier detection based accurate geocoding of historical addresses. In: Data Mining: 17th Australasian Conference, AusDM 2019, Adelaide, SA, Australia, December 2–5, 2019, Proceedings 17, Springer, pp 41–53
Yau SC, Bakar JA, Abdullah AA, Harun H, Rasli RM, Yang LZ, Mun ETY (2021) Detection of topic on health news in twitter data. Emerg Adv Integr Technol 2(2):23–29
Zhou H, Dai X, Lou L, Zhou C, Zhang W (2021) Association of sedentary behavior and physical activity with depression in sport university students. Int J Environ Res Public Health 18(18):9881
Wang H, Shen Z, Jiang S, Sun G, Zhang RJ (2021) Userbased collaborative filtering algorithm design and implementation. In: Journal of Physics: Conference Series, IOP Publishing, vol 1757, p 012168
Kharita MK, Kumar A, Singh P (2018) Itembased collaborative filtering in movie recommendation in real time. In: 2018 first international conference on secure cyber computing and communication (ICSCCC), IEEE, pp 340–342
Yu C, Huang L (2016) A web service qos prediction approach based on timeand locationaware collaborative filtering. SOCA 10:135–149
Tang M, Dai X, Cao B, Liu J (2015) Wswalker: A random walk method for qosaware web service recommendation. In: 2015 IEEE International Conference on Web Services, IEEE, pp 591–598
Kong L, Wang L, Gong W, et al (2022) LSHaware multitype health data prediction with privacy preservation in edge environment[J]. World Wide Web 25:1793–1808
Hu C, Fan W, Zeng E, Hang Z, Wang F, Qi L, Bhuiyan MZA (2021) Digital twinassisted realtime traffic data prediction method for 5genabled internet of vehicles. IEEE Trans Ind Inform 18(4):2811–2819
Qi L, Hu C, Zhang X, Khosravi MR, Sharma S, Pang S, Wang T (2020) Privacyaware data fusion and prediction with spatialtemporal context for smart city industrial environment. IEEE Trans Ind Inform 17(6):4159–4167
Dai H, Xu Y, Chen G, Dou W, Tian C, Wu X, He T (2020) Rose: Robustly safe charging for wireless power transfer. IEEE Trans Mob Comput 21(6):2180–2197
Gu R, Chen Y, Liu S, Dai H, Chen G, Zhang K, Che Y, Huang Y (2021) Liquid: Intelligent resource estimation and networkefficient scheduling for deep learning jobs on distributed gpu clusters. IEEE Trans Parallel Distrib Syst 33(11):2808–2820
Dai H, Wang X, Lin X, Gu R, Shi S, Liu Y, Dou W, Chen G (2023) Placing wireless chargers with limited mobility. IEEE Trans Mob Comput 22(6):3589–3603
Gu R, Zhang K, Xu Z, Che Y, Fan B, Hou H, Dai H, Yi L, Ding Y, Chen G, et al (2022) Fluid: dataset abstraction and elastic acceleration for cloudnative deep learning training jobs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), IEEE, pp 2182–2195
Dai H, Yu J, Li M, Wang W, Liu AX, Ma J, Qi L, Chen G (2023) Bloom filter with noisy coding framework for multiset membership testing. IEEE Trans Knowl Data Eng 35(7):6710–6724
Acknowledgements
The authors show special thanks to the providers of WSDREAM dataset for experiment validation.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Y. L: English writing and conceived idea. L. K: Developed research motivation and established the model. Z. L: Conceived experimental ideas and designed methodologies. F. J: Wrote the initial draft and executed the experimental plan. N. B: Reviewed and revised the initial draft, validated experimental results. T. D: Conducted a literature review for related work, providing comparative analysis. M. A: Shaped research ideas and enriched English writing.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
The coauthors all agree on the paper publication.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, Y., Kang, L., Li, Z. et al. Timeaware outlier detection in health physique monitoring in edgeaided sport education decisionmakings. J Cloud Comp 13, 73 (2024). https://doi.org/10.1186/s13677024006366
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13677024006366