AI-empowered mobile edge computing: inducing balanced federated learning strategy over edge for balanced data and optimized computation cost

In Mobile Edge Computing, the framework of federated learning can enable collaborative learning models across edge nodes, without necessitating the direct exchange of data from edge nodes. It addresses significant challenges encompassing access rights, privacy, security


Introduction
Owing to the inherent privacy concerns associated with edge data, individuals exhibit reluctance towards the prospect of relinquishing their data to centralized data repositories and cloud servers [1,2].Analogously, industries encounter the dual challenges of increased computational and communicative cost, along with the looming spectre of privacy breaches, when considering the storage of data in central server infrastructures.
Federated Learning (FL), a widely adopted Artificial Intelligence (AI) technique, offers an effective avenue to protect and secure the confidentiality of data residing within edge nodes [3].By facilitating the collaborative construction of a cohesive learning model across diverse edge nodes, FL eliminates the need for direct exchange of data samples under the scenario of Mobile Edge Computing (MEC).This paradigm effectively delivers to a range of critical concerns, including access authorization, privacy preservation, security assurance, and the management of disparate datasets.The wide-ranging utility of FL encompasses various domains, including prognostication and monitoring of mobile traffic [4], healthcare [5][6][7], the emerging field of the Internet of Things (IoT) [8], agriculture [9,10], transportation and autonomous vehicles [11], finance and stock market [12], disaster management [13], pharmaceutical sciences, and advanced medical artificial intelligence [14].

Preliminaries
FL is one of distributed machine learning (DML) technique that allows multiple nodes to train a machine learning model without exchanging data samples [15,16], as shown in Fig. 1.FL differs from DML in several ways.In DML, the data is first centralized on the server, and then the server splits it into subsets for learning tasks.In contrast, in FL, the data is not concentrated on the server, but rather the algorithm is distributed over the edge devices for processing [4,13].This means that FL has more training subsets than DML, and the data may not be identically distributed [17].It presents new encounters to existing privacy-preserving techniques and algorithms [18].It is crucial to create computationally and communication-efficient techniques that can withstand dropped devices without sacrificing accuracy, in addition to offering stringent privacy guarantees.

Process of federated learning
FL is an iterative approach that incorporates several client-server interactions, known as a FL round, to achieve higher performance than centralised machine learning [19].Diffusing the current or updated global model state to the contributing nodes (participants) initiates each interaction/round of this process.After that, the nodes' local models are taught to produce prospective model updates.Subsequently, an aggregated global update is created by processing and combining the changes from local nodes [20].This makes it possible to update the central model appropriately (see Fig. 1).With this system, local updates are processed and combined into global updates by a central server, called the FL server.Local nodes carry out the local training in accordance with the FL server's directives.The model is trained iteratively.The following describes the specifics of these steps: Step 1. Setup and Initialization: A central server or orchestrator manages the FL process.It holds the initial model architecture and distributes updates.Edge devices/clients are individual devices that store local Fig. 1 Step by step process of federated learning data and participate in the training process.The central server creates an initial model and sends it to all participating edge devices [21].
Step 2. Local Training: Each edge device trains the model received from the FL server, using its own local data.The training process might involve multiple iterations or epochs to improve the model's performance [22][23][24].
Step 3. Model Update: After local training, each edge device generates a model update, which essentially consists of weight changes that reflect what the device has learned from its local data.However, the actual data remains on the edge device and is not shared.

Step 4. Models Aggregation and Global Update:
The edge devices dispatch their model updates back to the central server.The central server aggregates these updates using techniques such as averaging or weighted averaging.This creates a global model update that incorporates the knowledge from all edge devices without revealing their individual data.The central server applies the aggregated model update to the global model, enhancing its performance by leveraging the collective knowledge of the edge devices.This is an iterative learning process.Steps 2 to 4 are repeated for a predefined number of iterations [25].In each iteration, the edge devices refine their local models, and the central server integrates their updates into the global model.Over multiple iterations, the global model converges to a state where it becomes more accurate due to the aggregated insights from the diverse data sources on the edge devices.

Motivation and rationale
Most classification tasks involve imbalanced classes, which can result in biased training of machine learning (ML) algorithms.Learning with an imbalanced distribution is a challenging problem in ML.One common solution to this problem is ensemble learning, which combines multiple models to improve overall performance.Another solution is sampling, which involves subsampling the data to obtain a balanced proportion of classes.However, sampling can be computationally expensive, and it is not always feasible to obtain a large enough dataset to utilize this technique.FL can be a promising solution for learning with imbalanced data.FL can address the privacy concerns associated with data sharing, and it can also be more efficient than DML in terms of computation and communication.However, there are still challenges to be addressed, such as the need to develop robust FL algorithms that can handle imbalanced data.Based on the nearby data they have access to; the edge nodes train a shared model.The distribution of edge data depends on how they are used.For example, cameras installed in parks tend to capture more photos of people compared to cameras located in the wild.These imbalances can be categorised into three types for better comprehension; Size Imbalance (irregular size of the data sample on each edge node); Local Imbalance (non-IID (non-identical distribution) and independent distribution of data [26]); Global Imbalance (data residing at all nodes class imbalanced classes [27]).

Problem statement
Federated Learning is an effective machine learning strategy with advantage of data privacy protection.However, it struggles to deal with unbalanced or skewed datasets present over edge devices.This local and global imbalanced data distribution leads to bias in the iterative model training phase and results in a decrease in the accuracy of FL execution [26,27].The objective of this research is to improve the accuracy by addressing the challenge of local data imbalance in a federated fashion.Moreover, solving the issue of imbalanced data without compromising privacy or increasing computation overhead.

Contributions of this article
This study primarily contributes to the solution of unbalanced data problems by applying a method that corrects imbalanced data via client-side class estimation and data augmentation.
1.The Balanced FL (Bal-Fed) [12] approach, has been utilized for implementation in the FL setting.This technique is tailored for attaining maximum accuracy with less training rounds to reduce the computation cost.Its goal is to achieve a balance between training accuracy and reduced computation cost.2. We applied this approach to train a Linear Regression machine learning algorithm in an FL setting, using an unevenly distributed dataset.3. To evaluate Bal-Fed applicability in diverse problems, this approach is implemented on both textual and visual datasets separately.Two distinct datasets are utilized in this study.The first dataset is FashionMN-IST, which consists of image data.The other dataset is stock market data, which includes both text and numerical data.4. The dataset of last 10 years stock market prices is fetched for the stocks of Amazon and Booking. 5.By showing positive results in both kinds of datasets, this method has been demonstrated to improve the model's accuracy in the FL setting.In the FL setting, it demonstrates an accuracy rate higher than 92% and 95% for images and stocks data, after dealing with the problem of data imbalance.6.This approach yields optimal performance with in 80 iterative rounds (while the pre-set iterations are 100) and terminates the iterative process, thus putting lesser computation load over the mobile edge devices.

Organization of the article
This article is divided into five main sections."Related Work" section explains the studies and presents the results of the experiments conducted to address issues and challenges like the one highlighted."Materials and Methods" section elaborates on the methodology and materials used to carry out the experiments."Setup and Implementation" section comprises the setup and implementation of the experiment.Results of the implemented methodology are described in "Results and Discussion" section, where "Discussion on the results of Fashion MNIST (images) data" presents the results of the image dataset, while "Discussion on the results of stock data" presents the results for text data."Implications" explains the implication of this research and highlights applicability of the field.Finally, "Implications" section concludes the entire experiment and presents the corresponding results.It also highlights future directions in this area.

Related work
In the realm of distributed data management, FL emerges as an evolving paradigm designed to address the complexities of privacy preservation.The development of healthcare frameworks has captured the attention of numerous research endeavours [28][29][30][31].While the landscape of Machine Learning (ML) comprises a myriad of approaches and frameworks, there is a scarcity of comprehensive investigations that delve into the assessment of data balance within FL paradigms [32].This section undertakes a thorough review of these empirical investigations, with a specific emphasis on studies relevant to our own research, which are concisely summarized in Table 1.
Across the network, nodes frequently accumulate and aggregate data in a manner that deviates from the Independent and Identically Distributed (IID) assumption [33,40,41].In the context of next-word prediction, cellular users might extensively engage with linguistic expressions.Moreover, the volume of data across several nodes can vary significantly.The improvement of the FL algorithm's convergence trajectory requires an evaluation of  [35,36,38,39] the statistical heterogeneity intrinsic to the data.Recent research has introduced methodological tools to quantify statistical heterogeneity by applying relevant metrics [42].Notably, these metrics, although valuable, cannot be quantified until training begins.Addressing this, Verma et al. proposed strategies specifically designed to improve machine learning models, even when dealing with highly skewed data distributions [37].This investigation covered a wide range of environmental contexts.Noteworthy among these investigations is an AI model that emerges from a combination of heterogeneous data sources, representing the FL methodology.In a similar vein, a significant contribution emerges in the work of authors who elaborated on an expanded version of DropConnect, known as DropConnect Generalization [43].This innovation plays a role in regulating densely interconnected neural network layers.DropConnect selectively nullifies a fraction of the network's weights, in contrast to the Dropout technique, which extends this nullification to randomly selected activations within each layer.
A comparative analysis was conducted between Drop-Connect and Dropout across various datasets.Notably, the integration of multiple DropConnect-trained models outperformed in ground-breaking results across various benchmarks for image classification.
The procedural algorithms that enable individual clients to independently update their respective local data within the existing model were originally formulated by Konecny et al. in 2016 [36].These algorithms enable clients to transmit their updated data to a central server.The server then aggregates the changes from multiple clients and computes a fresh global model.Primarily targeted at mobile phones, the efficacy of communication among the main constituents of this system is paramount.In this research, structured updates and sketch updates, were introduced to mitigate the costs associated with uplink transmission.Notably, Chen et al. [44] elucidated an end-to-end tree boosting mechanism referred to as XGBoost, which is frequently adopted by data scientists to achieve state-of-the-art results across diverse machine learning projects.Their work introduces a novel approach for handling sparse data, known as sparsityaware methodology, as well as a weighted quantile sketch designed specifically for tree-based learning.The study further explores methods to improve the scalability of XGBoost by examining data compression techniques, cache access patterns, and sharding.Ultimately, XGBoost has been demonstrated to scale adeptly to billions of samples, while consuming considerably fewer resources than previous systems [4].
The significance of imbalanced datasets and their multifaceted applications within data mining were initially introduced by Han et al. [45].Following this, they synthesized performance evaluation matrices and existing strategies aimed at mitigating the challenges posed by imbalanced data.The popular oversampling strategy, SMOTE is used to address this issue.This study introduces two additional variations, namely borderline-SMOTE 1 and borderline-SMOTE 2, which enrich the oversampling methodology.Nilsson et al. [46] conducted a benchmarking analysis on three FL algorithms.By centralizing data storage, the efficacy of these algorithms is appraised and compared.Notable among these algorithms are Federated Averaging (Fed-Avg), Federated-Stochastic Variance Reduced Gradient, and CO-OP.Their evaluation encompasses both non-IID.and IID.data partitioning schemes were used with the MNIST dataset to evaluate their performance, and it was found that the FedAvg algorithm achieved the highest accuracy.
Integrating FL with deep reinforcement learning (DRL) to enhance caching in edge is introduced in [47].Their application, called the "In-Edge Al" framework, enhances caching, networking, and mobile edge computing (MEC).The framework effectively utilizes edge nodes and device collaboration, demonstrating robust performance with minimal learning overhead.The authors also highlight challenges and opportunities, emphasizing the promising potential of the "In-Edge Al" framework [14].Similarly, Xu et al. [48] conducted a comprehensive survey on the expansion of FL in healthcare informatics.Their work addresses vulnerabilities, common statistical issues, remedies, and privacy concerns associated with FL.The outcomes of their research are envisioned as essential tools for the computational exploration of ML algorithms are tasked with managing extensive distributed data while also considering privacy and health informatics.
Clustered Federated Learning (CFL) [49] was developed as a solution to the decrease in accuracy in FL settings caused by divergent local client data distributions.CFL supports Federated Multi-Task Learning (FMTL), leveraging the geometric properties of the FL loss surface to effectively cluster client populations based on their data distribution characteristics.This clustering process maintains the communication mechanism of FL intact, providing robust mathematical guarantees for the quality of clustering.It integrates deep neural networks (DNNs) and ensures scalability while preserving privacy.
Frameworks for secure FL are introduced in [50], offering a comprehensive and robust platform that includes Federated Transfer Learning (FTL), vertical and horizontal FL.These frameworks are accompanied by concepts, infrastructure details, implications, and a comprehensive examination of advancements in this domain.The authors also advocate for the establishment of data networks between enterprises, based on federated processes, to facilitate data sharing while respecting end-user privacy [51].
In a recent contribution, Mohri et al. [52] presented an agnostic FL framework that optimizes a centralized model to be adaptable to various target distributions.Their framework is lauded for instilling a sense of fairness, as they propose a rapid stochastic optimization approach to address the related optimization challenges.Convergence bounds are provided, assuming a hypothesis set and a convex loss function.The effectiveness of their approach is demonstrated across diverse datasets, indicating its potential applicability to contexts beyond FL learning, such as domain adaptation, cloud computing, and drifting [41].In the realm of mobile devices, Bonawitz et al. [53] devised a scalable production method using TensorFlow (TF).Their work outlines foundational concepts, addresses challenges, and offers potential solutions [54].
It is evident, considering recent advancements in the field of FL, that numerous frameworks and techniques have emerged to address challenges such as communication costs, statistical heterogeneity, convergence, and resource allocation.In FL and imbalanced data, scholars have investigated diverse strategies to alleviate the influence of class imbalance on model performance.These methods encompass oversampling [45], target distribution [52], and class weights [44].The efficacy of these techniques can be contingent on the distinct characteristics of the dataset and the FL configuration.However, the issue of class imbalance and data imbalance remains inadequately addressed in certain works.This research persists in practicing existing methods and introducing novel approaches to enhance the management of imbalanced data within federated learning.This article seeks to bridge the gap by focusing on and addressing the challenge posed by class imbalance through a novel approach.

Materials and methods
The issue of data balancing is effectively solved in centralized ML after decades of research.FL is relatively a new emerging area where it is necessary to maintain privacy in composition.Balanced federated data can be achieved by generating augmented and synthetic data [55,56] on mobile edges, without compromising privacy.This follows the post-processing guarantees of differential privacy (DP) [57].Augenstein et al. [58] explored and demonstrated the federated fashion of generating synthetic data.In the federated setting, data synthesis can be used.Additionally, the client's estimation must be employed in the approach of self-balancing.As a solution, we utilized our FL approach called Bal-Fed (as shown in Fig. 2) that will be implemented to rebalance training.This approach is proven to be successful for the implementation of balanced federated learning for the stock market data for some of the stock [12].We now using this approach to prove the applicability of this approach to images data using benchmark dataset Fash-ionMNIST.The whole strategy developed to achieve the objective of data imbalance reduction with optimized computation cost entails the following steps.
Step 1. Selection of nodes at the mobile edge layer.
Step 2. Executing the Class Estimation of the edge clients.

Fig. 2 Proposed methodology working in the scenario of edge networks
Step 3. Performing data augmentation [59] and class balancing algorithm on the global distribution to address data bias.
Step 4. The linear regression algorithm is used for model training using data from each mobile edge node (as depicted in Fig. 2).
Step 5. Sending the updated models to the server for aggregation, which is performed using FedAvg.
Merging these two approaches for implementing and measuring the proposed solution on the Flower Framework and TF using distributed datasets.For this research, we utilized the Fashion MNIST dataset and collected the stock market dataset to assess the model's fitting to predict stock prices.Specifically, we used Amazon (AMZN) and Booking incorporation (BKNG) for our research.
These datasets were converted into distributed datasets to make them suitable for the FL framework.The utilization of two distinct datasets serves the purpose of showcasing the versatility of BalFed across datasets of different natures.The stock market dataset includes both numerical and textual data, while the Fashion MNIST dataset consists of image data.The reason for selecting this dataset is to assess the performance of the model on established datasets and compare its results with existing outcomes.The decision to include the stock price dataset is prompted by the limited research conducted on the application of FL in stock price prediction.The majority of research in FL settings has primarily focused on stock news and other areas not on predictions.This choice contributes to addressing a potential gap in this domain [60][61][62].Both datasets serve distinct objectives: the former is used for prediction tasks, while the latter is employed for classification problems.This deliberate differentiation helps evaluate the model's adaptability within the FL framework when faced with diverse problem domains.

Global aggregation over cloud server
For the aggregation of the global model, we adopt the well-established FedAvg algorithm.In synchronization with Algorithm 1 randomly selected subset of the federation's members (clients/devices) was designated to acquire the initial global model [63].In the subsequent step, each client selected for the ongoing round of training computes updates to its local model using its own dataset.These updates are then communicated back to the server [64,65], as described in "Process of federated learning" section.In the pursuit of refining the collective model, the server performs an averaging process on all the updates contributed by the clients.This iterative procedure continues until the model parameters converge, as determined by appropriate criteria.At that point, the process is repeated with a new round of training.
Algorithm 1 Federated Averaging (FedAvg).There are n edge devices, B is the local minibatch size, E represents total local epochs per communication round, η is learning rate, and f i is the loss function At the client side, "Gradient descent" takes place, and on the server side, aggregation takes place over "averaged clients updates".The amount of client computation is controlled by three key parameters.The fraction of clients that perform computation on each round is denoted by I.

Local training over mobile edge devices Linear regression
Interpreting and understanding linear regression is simple.A linear equation, which is simply understood and visualised, represents the relationship between each independent variable and the dependent variable.Using linear regression analysis, one can forecast a variable's value depending on the value of another variable [66].Predictability is required for the dependent variable.It is possible to anticipate the value of the other variable by using the independent variable [63].This analysis determines the coefficients of the linear equation by using one or more independent variables that can precisely predict the value of the dependent variable.The linear regression technique lessens the discrepancies between expected and actual output values by fitting a line or surface.Simple linear regression algorithms that find the best-fit line using the "least squares" method can be created from a collection of paired data.
Based on a certain value of the independent variable (x), the variable y indicates the expected value of the dependent variable (y).The intercept, or the expected value of y when x = 0, is represented by the symbol β0.Conversely, the regression coefficient β1 represents the anticipated shift in y with an increase in the independent variable (x).
(1) y = β0 + β1X + ε Since it is anticipated to have an impact on y, variable x is regarded as the independent variable.The degree of variation in the regression coefficient estimation is quantified by the variable ε in the equation, which stands for the error of the estimate.Finding the regression coefficient (β1) that minimises the model's total error (e) will yield the best fit.Linear regression commonly employs the mean-square error (MSE) as a metric to evaluate the accuracy of the model.MSE is computed through: Calculating the difference between the observed and predicted y-values for each associated x-value is the first step in the procedure.Afterwards, each of these distances must have its square calculated as part of the procedure.Each squared distance's mean is computed.A straight line is fitted to a set of data points using the statistical technique of linear regression, which entails figuring out the regression coefficient that minimises the mean squared error (MSE).
Although convolutional neural networks (CNN) are frequently employed to process visual input, their computational cost is relatively high [67].Because FL computation is done at the client side, it is preferable for edge nodes to incur lower calculation costs.In comparison to more intricate models like SVM, Random Forest, and DL [68][69][70][71], linear regression techniques have lower computational requirements [60], which makes them appropriate for situations with limited processing resources or big datasets.In the FL configuration, Random Forest (RF) takes 515 s, whereas SVM takes 4989 s for the training cycle, and LR only takes 7.6 s [60].As a result, we trained the clients' data locally using this technique.Additionally, this method yields results that are similar to CNNs.When the dependent and independent variables are both continuous, as is the case in the analysis of stock market datasets, LR is a good fit.Likewise, other studies have demonstrated the effectiveness of this approach on the FashionMNIST dataset.

Class balancing
In FL circumstances, it was not possible to obtain mobile edge node raw data in order to protect client privacy.For this reason, in accordance with their updated gradients, the class distribution along the edge side is assisted by the class estimation and balancing method [72].After that, this class estimation method was applied to even the classes and accompanying data by using data augmentation [59].The expected of gradient square for various classes during model training in FL has the approximate relationship shown below [73]. (2) where L denotes the cost function of the training algorithm.And for class i and class j the number of data samples n i and n j are; where i = j and i, j ∈C.Due to the correlation between gradient and class distribution, a class Ci, with class ratio , get a class estimation [72] as defined: In order to achieve class normalisation, β is adjusted as a hyperparameter.It is therefore possible to establish the composition vector R = [R1,…, RC] that represents the raw data distribution.Consequently, each mobile edge node's class imbalance is evaluated by the Kullback-Leibler (KL) divergence using U, the vector of classes of magnitude C.
After updating the model during FL training, the server can get the local model from every client device.With the class estimation method, the composition vector R k for the selected client k can be revealed.Then, we define the reward for client k as follows: The composition vector can be used to determine the class distribution.For instance, Rk(t) denote the composition vector of client k at time slot t.Consequently, the class ratio can be approximated using mean of composition vector, which can be defined as.
With the estimated composition vector R and reward r of each client, we can design the client selection scheme with minimal class imbalance according to Algorithm 2.
Algorithm 2 Class balancing algorithm T k

Class estimation and data augmentation
The class estimation technique employed in [74] was adopted in this study, and the same technique was subsequently utilized for class estimation.Data augmentation, as outlined in [75], was then applied.Data augmentation, as elaborated upon in references [73,76,77], refers to methodologies within data analysis aimed at expanding the volume of the dataset.This is accomplished by creating new synthetic data that is generated from the present dataset or by attaching significantly altered copies of the existing data.Data augmentation is incorporated to help prevent overfitting and to offer regularisation for machine learning models during training.
For the context of a FL system focused on multi-class classification tasks, the system architecture includes a central server responsible for managing the global model.Accompanying the server are a collection of clients, denoted as K = {1, 2, …, K}.Each client possesses an independent local dataset, designated as D k .During the rth iteration of the FL process, a designated client, known as client k, is selected to participate in the learning process.This entails starting local learning with its unique local dataset D k and the initial global model vector w r that the server provides.Following that, client k uses its local dataset D k to create a mini-batch collection, designated as B k .Stochastic gradient descent (SGD) optimizer is used for the subsequent local learning [68,78].The following is the definition of the updating mechanism for this localised learning project: In the above equation, |Dk | represents the size of the dataset Dk, while fk(wk,r; x) stands for the loss function associated with the local model vector wk,r and the data instance x.The learning rate is denoted by η.The training of the local model is carried out for a predefined number of local epochs for each chosen client.The locally obtained vector is subsequently sent to the central server.The incoming local model vectors are then aggregated to update the global model vector that is kept up to date by the server.Every local model is given a unique weight throughout the aggregation process.These weights are determined by dividing the total amount of data utilised by all participating customers by the percentage of data used in local training for each client.The process of aggregating these weights is mathematically expressed as: Here, S refers to the set of clients selected by central server to participate in the learning process, D represents (7) the union of the local datasets from all the selected clients.The term D' k denotes the data specifically used by client k for local learning, and it holds the relationship D' k ⊂ D k .This iterative process is reiterated until a predeter- mined round threshold is reached as given in Fig. 3.

Setup and implementation
The proposed framework should be implemented to achieve a balanced training process, as illustrated in Fig. 2. Data augmentation is employed for augmenting the amount of data by either adding modified copies of existing data or generating synthetic data from the current dataset.This technique helps to reduce overfitting.We are also considering future improvements for this study, such as utilizing data synthesis to create a new dataset based on the existing one.It takes.CSV data as input and produces a synthetic dataset using DP.
The model training and aggregation is set up in a sequential workflow of Fashion MNIST and stock data that is illustrated in Fig. 4. We have automated the process of converting the collected data into federated data.Every client receives a random distribution of the data.Next, the BalFed method is utilised.Open-source frameworks for decentralised data machine learning and other computations are the Flower and TF frameworks.TF was developed to enable collaborative research and testing with FL, a machine learning technique that involves building a shared global model across several clients using locally stored training data.With Flower and TF, developers may test new algorithms and simulate the integrated FL algorithms on their models and data.With TF's building blocks, non-learning computations can be implemented, including federated analytics, using python which is a concise and legible programming language that is used to create and simulate algorithms.

Results and discussion
This section encapsulates all the findings obtained throughout the course of this research.Within this section, we present numerical results that clearly demonstrate the effectiveness of the proposed algorithms.The effectiveness of our developed methodology was subjected to rigorous evaluation using Fashion MNIST, a well-known benchmark dataset of importance.

Discussion on the results of fashion MNIST (images) data
Fashion MNIST is widely used for testing and benchmarking machine learning algorithms, especially in the context of image classification and deep learning.Its similarity in size and structure to MNIST makes it an ideal substitute when researchers want to experiment with more complex datasets without significantly increasing the computational requirements.
The Fashion MNIST dataset consists of 70,000 images of Zalando's clothing articles.It includes various types of clothing such as T-shirts, shoes, pants, tracksuits, etc.Each type is assigned a numerical value, for example, 0 for T-shirt/top, 1 for Trouser, 2 for Pullover, 3 for Dress, 4 for Coat, 5 for Sandal, 6 for Shirt, 7 for Sneaker, 8 for Bag, and 9 for Ankle boot.Images are in grayscale with a size of 28 × 28 pixels, as shown in Fig. 5. It's worth noting that while Fashion MNIST has been widely used, more challenging datasets with higher complexity and diversity have emerged in recent years to push the boundaries of machine learning performance in image recognition tasks.An algorithm's performance is typically assessed using a confusion matrix that shows the mistakes that were made.The number of projected test results that belong to the right class and the number of results that are assigned to the wrong class are displayed in this matrix.The information inserted into the matrix is useful for assessing and figuring out the algorithms' evaluation metrics.The widely used criterion of accuracy, which is defined as: where N = TP + TN + FP + FN and following quantities of confusion metrics are represented as: TN (true negatives), (9) Accuracy(Acc) = TP + TN N

FN (false negatives), TP (true positives), FP (false positives).
Accuracy refers to the ratio of the total number of correct predictions and classifications to the total number of accurate and incorrect forecasts.In the field of statistics, it is also referred to as precision.Accuracy can be misleading in several instances.For classification accuracy, it can provide a better insight into the performance of the model.However, it may be necessary to choose a model with lower accuracy due to the increased predictive power it provides in a specific situation.As a result, it is a good idea to utilize alternative performance metrics, such as the F1 measure, which is represented as Eq. 10.To simulate the class distribution of the entire dataset, we performed an automated conversion of the dataset into a federated dataset.So that each client has a random distribution of classes and data samples.In other words, the distribution of data varies depending on the client.Our model is developed using linear regression [79].In general, there are 1,22,570 parameters.A commonly used classifier, such as linear regression, can fulfill our requirement to validate the effectiveness of our BalFed approach.
For this experiment, we used TF and Google Colab.We used a standard SGD optimizer with the classifier.The selected 20 clients trained their local models for 20 epochs in each training round out of 100 rounds.At each training epoch, the client selects 10 batches with a batch size of 10.From this data, 60,000 images will be treated as a training set and 10,000 as a test set, as recommended.After implementing the Bal-Fed setting, the data is loaded onto 20 clients to obtain the results based on the evaluation measures.As compared to the FL technique, which gives 85% accuracy [79], using Bal-Fed can increase the accuracy to 92% for the same dataset (as shown in Table 2).

Discussion on the results of stock data
Real-time data is gathered for this experimental investigation from the Y-finance API.Amazon (AMZN) and Booking Inc. (BKNG), two significant stock market businesses, are the sources of the data.The retrieved data was organised continuously between January 1, 2013, and February 1, 2023.For every stock, there were 2,517 records, organised into several CSV files.The date, closing price, and projection make up each record.An automated method is used to transform this data into a federated dataset.
Metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R-Squared, and MSE are commonly used in regression analysis to assess the predictive error rates and model performance.The MAE, or difference between the expected and original values, is obtained by averaging the absolute differences over the data set.The mean square error (MSE) represents the variation between the original and predicted values by squaring the ( This model is developed using linear regression [72] in the Flower Framework [80].Using this framework and Scikit-Learn, the Bal-Fed is implemented.The framework's evaluation function is the Mean Squared Error (iteration-by iteration results are provided in Appendix A).The minimum number of edge nodes is set to 20.By doing this, each node trains a linear regression algorithm using data from a single province before sending the gradient of loss from the model to the server.The server updates model parameters using FedAvg.To update each local model, the new parameters are delivered to the worker nodes.Until the application requirements are satisfied, this process is iteratively performed in a convergent manner without sharing any data.We chose a 90% and 10% data split for training and testing.
After performing a linear regression analysis, line, residuals, and scatter plots are important tools for assessing the quality of the regression model and identifying potential issues or patterns in the data.A line graph, in the context of linear regression, represents the relationship between the independent variable (X-axis) and the dependent variable (Y-axis) based on the predictions of the linear regression model.However, it is important to note that a line graph is not typically used directly to represent the outcome of a linear regression analysis.Instead, it is more commonly used to demonstrate the trend or pattern in the data before and after applying the regression model.To address this issue, a scatter plot is (11)   The obtained data frame had columns named Date, Open, and Close.The prediction results of the proposed technique for the stock data are plotted in a line graph, as shown in Fig. 6.The predicted values are consistent with the actual values.The resultant graph and the fitted model in the scatter graph and residual graphs are shown in Figs.7 and 8 respectively.For this dataset, 20 clients are selected for 100 communication rounds, with 5 epochs in each communication round.The reason for using fewer communication rounds with this dataset is that it achieved 95% accuracy with only 5 epochs.This accuracy is likely to increase by increasing the epochs, but it can lead to more communication time on client The prediction values resulted in an accuracy of 95% with minimal data loss, as shown in Table 3.The values of Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared are obtained and are shown in Appendix A in Tables 4 and 5 for AMZN and BKNG respectively.
The accuracy of the predicted prices of the stocks and Fashion MNIST data is sufficient with the BalFed model, and it is better than the results reported in the literature, which provide 85% accuracy for the federated setting [79].

Analysis on the performance of bal-fed
This study presents a model inside the Flower framework that was created using linear regression.Metrics including R-squared, MSE, MAE, and RMSE are used in the evaluation; Table 1 provides a thorough breakdown of the results.Each edge node individually trains a linear regression algorithm with class estimation and balancing (Algorithm 2), with a minimum of 20 edge nodes.The gradient of loss is then sent to the server by each node.For every local model update, the FL server distributes the modified parameters to edge nodes using the FedAvg algorithm (Algorithm 2).Until the training termination requirements-100 rounds of local data training-are satisfied, this iterative process proceeds in a convergent manner without sharing any data.The model goes through several iterations in order to attain an optimal state, after which additional training has little effect on its performance.Using a decentralised or distributed training strategy, each client processes independently its own local data, with 90% of the data being partitioned for training and 10% for testing.
Twenty clients and one hundred communication rounds-each with five epochs-were included in the dataset.The dataset's remarkable 92% accuracy within 75 rounds was a factor in the decision to reduce the number of communication rounds.And the termination conditions were met in 80 rounds (see Tables 4 and 5 in Appendices A and B, respectively).As a result, the reduction in communication expenses and learning time was effectively achieved.
The mean absolute error (MAE) is the mean difference between the expected and actual values.It provides information about the average size of errors and is preferably less.Greater weight is assigned to larger errors in the Mean Squared Error (MSE) calculation, where lower MSE values denote superior performance.The average size of errors in the same units as the target variable is measured by RMSE, which is the square root of MSE and offers an extra evaluation of prediction ability.R-squared, which goes from 0 to 1 and indicates a perfect fit, evaluates how well the model's predictions account for variance in the actual data.
In Tables 4 and 5 in Appendices A and B, the R-squared values stand out with notably high values (e.g., 0.94, 0.90) across the majority of cases.This suggests that the model adeptly captures variations in stock prices, demonstrating strong performance.Using the AMZN stock as an example, during the 80th training iteration, the R2 is 0.93, the MAE is 10.75, the MSE is 153.5, and the RMSE is 12.9.The R-squared value indicates a moderate fit, while the low MAE, MSE, and RMSE values show that prediction errors have relatively small magnitudes.These metrics point to a moderately fitting model for AMZN stock AAL during  the 80th training cycle.Comparable trends are noted with BKNG stock.In the case of Images data, both accuracy and data loss are minimal, indicating a strong fit of the model to the images data.However, it is important to note that further exploration is needed, particularly when dealing with data featuring more complex features.

Implications
FL presents a promising avenue for training decentralized data, residing on local client devices, thereby enhancing efficiency and safeguarding privacy.Nonetheless, the distribution and volume of training data at the clients' end can engender substantial challenges, including class imbalance and the presence of non-IID data.These challenges can exert a pronounced influence on the performance of the shared model.Despite concerted efforts to facilitate the convergence of FL models in the face of non-IID data, the issue of data imbalance remains inadequately addressed.As FL training entails the exchange of encrypted gradients, rendering the training data partially concealed from both clients and servers, conventional methods for addressing class imbalance exhibit suboptimal performance within the FL paradigm.Hence, the development of novel techniques to detect and alleviate class imbalance in the FL context assumes paramount significance.This study introduces Bal-Fed, a method capable of inferring the composition of training data for each FL iteration, thereby mitigating the adverse effects of imbalance.Through experimental validation, we underscore the significance of class estimation and the implementation of client-side strategies in FL training.The efficacy of our proposed approach in reducing the imbalance's impact is vividly demonstrated.Notably, our method markedly surpasses prior approaches, while concurrently upholding client privacy.It achieves accuracy rates of 92% for image data and 95% for stock price data, underscoring its proficiency.Addressing imbalanced data in federated learning can have notable positive impacts across various applications.
In essence, addressing imbalanced data with bal-fed in federated learning can lead to more robust and accurate models with optimized communication cost.Specifically, in the scenarios where certain outcomes or events are infrequent but crucial for decision-making in various domains, such as in finance, telecommunication, environmental monitoring and retail.In finance to identify rare fraudulent transactions in financial datasets and enhancing risk prediction models by addressing imbalances in data related to high-risk scenarios.Similarly, in in telecommunication, the detection of rare security threats or attacks on telecommunication networks and in quality of service (QoS) prediction by enhancing models for predicting rare instances of service degradation.Additionally, it can help in improving models for identifying infrequent environmental events, such as natural disasters or unusual phenomena.Lastly, bal-fed can help in enhancing models to identify patterns in customer behaviour for targeted marketing strategies with ensuring the user data privacy.

Conclusions and future work
In traditional centralized machine learning, all local data is uploaded to a single server, which raises privacy concerns.FL is a machine learning methodology in which users' data is used to train a model, but the data is not shared with the cloud server.Only the results or trained models are uploaded.This approach is more efficient in terms of generalization, privacy, and system correctness.However, a major challenge in FL is data imbalance.This occurs when the distribution of classes in the local data of different clients is significantly different.To address this issue, we propose a data balancing technique called data augmentation.
This technique is implemented in the TF and Flower frameworks utilizes various deep learning (DL) algorithms address reduce data imbalance prior to the training of local models.We also address the problem of client selection caused by imbalanced FL data.We propose a method to manage the class distribution by automatically generating augmented data for each client during local model training.This is done without requiring any information about the clients' data.Additionally, we propose a combination of client selection and data balancing techniques to further mitigate the impact of data imbalance.
Our numerical results show that the proposed technique can select a well-balanced client set and improve the algorithm convergence performance of the global model.We applied the technique to the Fashion MNIST dataset and Stock Price Data, and it achieved good results in terms of accuracy and F1 measures.Extensive experiments demonstrate that our method can significantly outperform previous solutions for imbalanced data.The accuracy of the predicted prices of the stocks and Fashion MNIST data is sufficient with the BalFed model, and it is better than the results reported in the literature, which provide 85% accuracy for the federated setting [79].
In the future, we are planning to enhance our research even further and improve the prediction capabilities of Bal-fed.And to apply this technique to various applications such medical imaging and human activity recognition.We are intending to develop a mobile application that can benefit users in predicting the value of stocks.Moreover, we intend to use this mobile app for diagnosis through medical images.

Fig. 3
Fig. 3 Workflow of the Bal-Fed technique for client-side data augmentation with the implementation of FedAvg Algorithm

Fig. 4
Fig. 4 Schematic workflow sequence used for stock data applied for comparing the prediction performance of BalFed

Fig. 5
Fig. 5 28 × 28 images from fashion MNIST utilized to validate our model and visually represent the actual data points.Each point on the plot corresponds to a pair of values from the actual and predicted stock prices.The scatter plot, presented in Fig.6, showcases the actual data points and overlays the regression line.The regression line represents the line of best fit generated by the linear regression model.It is worth noting that the model's predictions align with the actual data.Residuals are the differences between the actual/ observed values and the predicted values generated by the LR model.In other words, they represent the errors or discrepancies between the model's predictions and the actual data points.Residuals provide insights into how well the model captures the underlying relationships in the data.A well-fitted model should have residuals that are randomly distributed around zero, with no discernible patterns.Patterns in the residuals can indicate issues such as underfitting or overfitting, heteroscedasticity (varying variance), or omitted variable bias.

Fig.Fig. 7
Fig. Prediction of stock prices using Bal-Fed

Fig. 8
Fig. 8 Results of the BalFed model for stock price prediction in residual graphs

Table 1
Summary of the literature 10) F 1 − Score = F 1 = Precision × Sensitivity Precision + Sensitivity average difference across the data set.How well the data fit together in regard to the original values is indicated by the coefficient of determination, or R-squared.A number between 0 and 1 is interpreted as a percentage.where P and P̂ represent the true price and predicted price, respectively, and n denotes the number of samples in the test dataset.Training time is measured to study the latency of ML model training.It increases depending on the complexity of the model, the size of the dataset, and the performance of the processing framework.Training delay reduction offers real-time prediction/classification benefits.Table 2 shows the MSE results of the LR model executed on various ML frameworks.

Table 2
Results of fashion MNIST applied to evaluate the prediction performance of BalFed

Table 3
Results of fashion MNIST applied to evaluate the prediction performance of BalFed

Table 4
Subset of the variations of R Squared, MAE and MSE w.r.t training iteration for Amazon stock data

Table 5
Subset of the variations of R Squared, MAE and MSE w.r.t training iteration for booking stock data