Skip to main content

Advances, Systems and Applications

FedEem: a fairness-based asynchronous federated learning mechanism


Federated learning is a mechanism for model training in distributed systems, aiming to protect data privacy while achieving collective intelligence. In traditional synchronous federated learning, all participants must update the model synchronously, which may result in a decrease in the overall model update frequency due to lagging participants. In order to solve this problem, asynchronous federated learning introduces an asynchronous aggregation mechanism, allowing participants to update models at their own time and rate, and then aggregate each updated edge model on the cloud, thus speeding up the training process. However, under the asynchronous aggregation mechanism, federated learning faces new challenges such as convergence difficulties and unfair model accuracy. This paper first proposes a fairness-based asynchronous federated learning mechanism, which reduces the adverse effects of device and data heterogeneity on the convergence process by using outdatedness and interference-aware weight aggregation, and promotes model personalization and fairness through an early exit mechanism. Mathematical analysis derives the upper bound of convergence speed and the necessary conditions for hyperparameters. Experimental results demonstrate the advantages of the proposed method compared to baseline algorithms, indicating the effectiveness of the proposed method in promoting convergence speed and fairness in federated learning.


In the context of edge computing, edge devices use local data to train local models and upload them to the cloud to aggregate and update the global model. A lot of practice has found that data is not independent and identically distributed [1,2,3,4]. Federated learning and traditional distributed machine learning share a common research objective: minimizing training time, as measured by the clock time needed to achieve the desired accuracy. It is important to mention that in most existing literature on federated learning, including the pioneering work on the FedAveraging algorithm, the assumption is made that communication between clients and the server is fully synchronous. This means that the server waits for all selected clients to finish their local training and report their trained models before aggregation takes place [5]. This straightforward and efficient design has been widely adopted by many existing studies and bears resemblance to the batch synchronous parallel mechanism used in distributed machine learning within a single cluster. However, it should be noted that in the case of heterogeneous clients, where different edge devices act as clients with varying computing abilities, there can be significant differences in their local training performance. In fact, the training time for the same amount of computation may exhibit a heavy-tailed distribution. If some clients perform their local training at a much slower pace than others, the performance of this synchronous communication mechanism may be compromised, as the server has to wait for these stragglers, significantly reducing system parallelism.

In this scenario, introducing an asynchronous communication mechanism is an effective solution [6, 7]. In asynchronous federated learning, the server is not required to wait for all selected clients to report their model updates; instead, it continues the aggregation process immediately when a client’s model update arrives. The asynchronous mechanism has advantages over synchronous FL. In synchronous federated learning, the number of active clients fluctuates throughout each round as clients join and leave the queue, with a decrease towards the end due to stragglers. In contrast, asynchronous federated learning maintains a relatively stable number of active clients over time. As clients complete their training and upload their model updates, their positions are replaced by newly selected clients, increasing the parallelism of the asynchronous system.

Although the asynchronous mechanism enhances system parallelism, when the client computing speeds follow a heavy-tailed distribution, potential issues can arise. In a problematic scenario, fast clients can quickly update the global model by completing their local training, while slower clients make minimal progress based on outdated global models. In traditional distributed machine learning with parameter servers, the pathological scenario is prevented by introducing bounded staleness in the outdated synchronous parallel (SSP) mechanism [8]. In FedAsync, a staleness function is proposed to compute a mixed hyperparameter \(\alpha\) for model aggregation. Intuitively, the weight assigned to a client’s model updates when aggregating into the global model decreases as the client becomes more "stale". Through this simple design, FedAsync demonstrates its ability to address the issue of localized regularization to ensure convergence, and a similar approach is introduced in the asynchronous aggregation mechanism proposed in this paper.

Another potential issue arising from this process is fairness [9]. It is evident that fast clients are selected for local training far more frequently throughout the entire federated learning process, as they can quickly complete their local model training and enter the waiting-to-be-selected state. However, this process is undoubtedly unfair for fast clients. Specifically, fast clients may expend much more computational power throughout the process than slow clients, only to end up with the same model as slow clients. What’s worse is that the models obtained by fast clients, who contribute more, may have lower accuracy on their local test sets compared to the models contributed by less contributing clients. The fairness issue arising from this situation deserves further investigation, as unfair mechanisms designed in federated systems may discourage clients from joining the federation for distributed training, especially for fast clients. This, in turn, may result in a reduction of fast clients in the entire system, thereby decreasing the overall performance of the federation.

In summary, the heterogeneity of clients and the introduction of asynchronous mechanisms make the entire system more complex and complicate the trade-off between fairness and model performance. To address this, this paper proposes an adaptive asynchronous federated learning aggregation mechanism, referred to as FedEem , with the following two main improvements.

  • Propose an aggregation algorithm to judge the obsolescence degree and gradient drift degree of client models, effectively reducing the impact of system failures, and allowing clients to perform local updates in different rounds, instead of using globally synchronized rounds of local updates commonly used.

  • Propose an early exit mechanism to reduce the fairness issue caused by the over-selection of fast clients while ensuring that the convergence speed of the system does not significantly decrease.

This paper first introduces and analyzes the necessity and effectiveness of these two mechanisms in detail. Then, through experiments, the superiority of the proposed method compared to Fedbuff, FedAsync, and FedAvg methods is demonstrated. Finally, detailed mathematical analysis of the convergence is provided.

Related work

The related work can be summarized into the following three points: Asynchronous Federated Learning, Personalized Federated Learning and Fairness Issues in Federated Learning.

Asynchronous federated learning

In the classical federated learning paradigm, synchronous aggregation strategies face challenges in effectively utilizing limited resources, particularly on heterogeneous devices. This is because they have to wait for slower devices to complete their computations before aggregating in each training round. Additionally, the heterogeneity of data distribution, known as data heterogeneity, in real-world mobile edge computing scenarios can significantly impact the accuracy of the model. Hence, some research works have attempted to use asynchronous model updates to improve efficiency, performance, privacy, and security. Xie et al. proposed the paradigm of asynchronous federated learning called FedAsync, which solves the regularized local problem to ensure convergence, and then updates the global model using stale-weighted averaging, demonstrating the proposed method’s near-linear convergence for both strongly convex and constrained non-convex problem families. Chen et al. introduced Asynchronous Online Federated Learning (ASO Fed) as an extension of FedAsync. They proposed online optimization policies to tackle three potential training challenges: 1) the data on local devices can increase over time, leading to changing correlations among clients in an online setting; 2) due to network constraints, mobile devices may frequently go offline or have poor communication bandwidth, making synchronous federated learning frameworks highly sluggish; 3) In the context of federated learning, edge devices may experience delays or even drop out of the training process due to various factors such as data heterogeneity or system heterogeneity. These factors can introduce inconsistencies and hinder the smooth progress of the federated learning process [10]. Nguyen introduced a model aggregation scheme called FedBuff, which aims to leverage the benefits of both synchronous federated learning (FL) and asynchronous FL. In FedBuff, the server aggregates client updates in a dedicated buffer, allowing for more flexible and efficient aggregation [11]. This approach demonstrates improved convergence speed compared to FedAsync and is compatible with existing secure aggregation and privacy techniques. It offers a promising solution for achieving efficient and secure federated learning. Su et al. enhanced FedBuff by dynamically adjusting aggregation weights considering the staleness and divergence of model updates. They carefully selected operating points in each dimension of the design space and ensured verified convergence guarantees [12].

Personalized federated learning

Data heterogeneity poses a significant challenge in current federated learning approaches.Research findings suggest that the accuracy of FedAvg experiences a significant decrease when trained on non-identically and independently distributed data [13]. Additionally, the updates of completely synchronized models result in a lack of personalized solutions. Users from diverse scenarios may exhibit varying usage patterns due to subtle distinctions in their environments and requirements [14]. In such scenarios, the need for more personalized predictions arises to provide users with more meaningful word suggestions. This challenge not only affects the training of the global model but also impacts its performance on local data of specific clients. Consequently, this may discourage the participation of affected clients in the federated learning process.

Personalized federated learning offers a promising solution to tackle the issue at hand. By training customized local models for each user, it effectively addresses the data heterogeneity among clients [15]. Presently, personalized federated learning methods primarily concentrate on optimizing from both data-based and model-based standpoints. Data-based approaches strive to minimize the statistical heterogeneity of client data distribution through techniques such as data augmentation [14] and node selection [16]. Model-based approaches, on the other hand, focus on learning a robust global model that can be further personalized for each client or enhance the adaptability of local models. Common practices include adding regularization terms [17], meta-learning [18], and transfer learning [19]. Some research attempts to enhance the robustness and generalization of federated learning through methods like clustering [20], multitask learning [21], model interpolation, and knowledge distillation [22]. In this paper, we utilize meta-learning for client initialization during training.

Fairness issues in federated learning

In the federated learning system, when clients participate in federated learning, they inevitably consume resources on their devices, including computational resources, communication resources, and power resources. Without sufficient rewards, clients may be unwilling to participate or share their trained models. Hence, creating a fair, rewarding, and secure environment for federated learning becomes imperative to encourage a substantial client participation.

Zhou et al. classify fairness in federated learning into three categories: performance fairness, collaboration fairness, and model fairness [23]. For performance fairness, most schemes aim to promote a consistent accuracy distribution among participants and achieve reasonable resource allocation in heterogeneous systems through joint optimization. In terms of collaboration fairness, current research primarily focuses on ensuring that each participant receives a fair representation of the rewards they contribute to the federated system [24], thus establishing a sound incentive mechanism. Incentive mechanisms in federated learning mainly attempt to construct a contribution model for each participant and provide corresponding rewards. Currently, contribution models are mainly based on the value of client data, which is evaluated from the perspectives of data quality and data quantity. Evaluation methods based on data quality employ metrics such as Shapley value [25], auction mechanisms [26], contractual theory, etc. Evaluation methods based on data quantity adjust the size of participating data to fully consider the rewards and energy costs obtained by each client. Furthermore, Zhan et al. introduce a novel approach that integrates game theory and deep reinforcement learning. In this approach, the parameter server functions as a deep reinforcement learning agent, enabling it to determine optimal payments without the requirement of accurately assessing each client’s contributions or obtaining their private information beforehand [27].

Regarding model fairness, Du et al. proposed reweighting the objective function under fairness constraints [28]. Liang et al. attempts to reduce the impact of variance in data distribution by locally learning representations on each client while jointly learning the global model [29]. It is important to acknowledge the trade-off between performance fairness and model fairness. Performance fairness prioritizes achieving a balanced accuracy of the global model, whereas model fairness focuses on the performance of the model on local data. Collaboration fairness relies on an executable and sound incentive mechanism. In the context of mobile edge computing, most traditional federated incentive mechanisms are ineffective because most clients (such as mobile phone users and IoT devices [30,31,32,33,34]) do not expect to gain economic benefits through federated learning. Their primary concern lies in determining whether federated learning can enhance the accuracy of the model on their respective local data, which is known as model fairness.

In summary, the existing work has the following shortcomings: 1) The heterogeneity of the clients and the introduction of asynchronous mechanisms make the entire system more complex. This may lead to imbalanced resource utilization and slower model convergence speed. 2) Balancing fairness and model performance in federated learning is often challenging. This paper aims to improve the fairness of the model while ensuring that the convergence speed of the model does not significantly decrease.


In order to address the issue of slow model updates, an increasing number of federated learning approaches have adopted asynchronous aggregation patterns in recent years. FedEem also utilizes this asynchronous aggregation mechanism, which allows clients to upload models at different time points and update the global model by merging these models. However, FedEem has made certain innovations in the aggregation mechanism by introducing obsolescence discount, diversity discount, and early stopping mechanism.

Asynchronous aggregation mechanism

In an asynchronous federated learning system, clients that receive the global model from the server several rounds ago may become outdated, resulting in lower quality model updates during the aggregation process. This can disrupt the approximate consensus of the majority of other clients and impede the convergence process. It is intuitive to reduce the weight assigned to these outdated clients during the aggregation process [6]. To measure these effects, the following evaluation metrics are proposed in this section.

Obsolescence Discount refers to the concept of quantifying the obsolescence of a client in an asynchronous federated learning system. The obsolescence of a client is determined by the number of global update rounds that have passed since the client last received the global model from the server. It is reasonable to assume that the more obsolete a client is, the lower its aggregation weight should be. According to [11], the obsolescence of clients must be bounded, otherwise the convergence of the model cannot be guaranteed. Let \(\tau\) represent the current training round on the server, and let \(\tau _{k}\) represent the training round corresponding to the last time client k received the global model from the server. The obsolescence \(s_{k}\) of client k can be calculated as \({\tau } - \tau _{k}\). The following obsolescence function is used to calculate the obsolescence discount, which discounts the aggregation weight:

$$\begin{aligned} s_\tau ^k=\alpha \cdot \frac{\Omega }{S^k+\Omega }, \end{aligned}$$

Where \(\Omega\) is used to represent the upper bound of obsolescence. Clearly, the upper bound of \(s_\tau ^k\) is 0.5\(\alpha\), where \(\alpha\) is a hyperparameter that controls the importance of obsolescence discount in the aggregation process.

To measure the staleness of client updates, the difference between local accumulated gradients and global aggregated gradients can be utilized. Let \(w_i-w_{i-1}\) represent the disparity between the models obtained from the most recent two rounds of server aggregation. Here, \(w_i\) denotes the parameters of the global model in i-th round. In round i, client k uploads its weight updates obtained from training, denoted as \(\delta i_k\). If \(\delta i_k\) significantly disrupts the general consensus \(w_i-w_{i-1}\), it implies that the update from client k may not contribute to global optimization and should be discounted during the aggregation process. The interference can be quantitatively assessed by calculating the cosine similarity \(\theta _i^k\) between \(\delta i_k\) and \(w_i-w_{i-1}\). A lower \(\theta _i^k\) indicates less similarity between the two vectors. Consequently, the dissimilarity discount can be defined as follows:

$$\begin{aligned} \theta _i^k=\beta \cdot Similarity\left( \Delta _i^k, w_i-w_{i-1}\right) +1, \end{aligned}$$

Where Similarity(XY) represents the cosine similarity between vectors X and Y. Similar to the obsolescence discount, a hyperparameter \(\beta\) is introduced here to control the magnitude of the dissimilarity discount. Taking these two influencing factors into account, the aggregation weight can be defined as follows:

$$\begin{aligned} p_i^k=\frac{\left| T_k\right| }{|T|}\left( s_i^k+\theta _i^k\right) , \end{aligned}$$

Where T represents the dataset of client k.

Early exit mechanism

Another mechanism introduced in this chapter is the early exit mechanism. In previous works such as FedAvg and FedAsync, clients were not allowed to stop early during the training process and were required to complete all training rounds. This requirement is reasonable in synchronous federated learning, where clients are sampled with equal probability, resulting in consistent expected rounds \(\mathcal {E}\left( n_k\right) =S T / K\) for all clients sampled in total training rounds T. However, the introduction of asynchronous mechanism breaks this balanced expectation. Clearly, in an asynchronous federated learning system, if clients are still sampled with equal probability per round, the expected rounds \(mathcal{E}\left( n_k\right)\) for fast clients and slow clients required to perform local training are not consistent, and this difference increases with the increase in computational speed differences among clients. This unfairness is problematic for fast clients because it discourages their participation in federated learning. While the concept of federated learning promotes collaboration among clients in a distributed manner, the unequal treatment of fast clients would deter users with high-performance computing devices from engaging in federated learning.

Furthermore, it is important to acknowledge that the non-independent and non-identically distributed data introduces variations in objectives among different clients. In traditional federated learning, the objective is to achieve optimal performance of the global model across all clients. However, from an individual client’s perspective, the objective is to attain excellent local performance. Please note the difference between these two objectives, as the latter allows for inconsistent models across different local clients.

If a client has achieved sufficient performance in its local training after participating in several rounds of federated updates, it naturally tries to exit the federated system. However, it is important to note that allowing quick client exits may potentially cause issues. After a client is selected, it undergoes local training and uploads its model for aggregation. In return, the client receives the global model. While this global model may exhibit good performance on the client’s local dataset, it may suffer from poor generalization performance. This is because the global model tends to optimize towards the client’s objectives based on the previous round of updates. Therefore, it may result in the global model not having good generalization performance yet.Another more critical reason is that when the aggregation of the global model reaches a high position (e.g., 92\(\%\)), there is a high probability that the client immediately achieves its local training goal (e.g., 95\(\%\)). If the clients are allowed to exit at this time, a large number of clients may withdraw within a few rounds. The problem with this is that only a few clients are left and continuously selected, leading to a severe deviation between their optimization direction and the global optimum. Consequently, it may result in a global model with poor generalization performance.

As shown in Fig. 1, clients a,b,c are participating in asynchronous federated learning. In rounds t and t + 1, client b is performing gradient descent with clients a and b. At this point, client b is overly involved, and the global parameter \(w^{t+2}\) is already sufficiently close to the global optimum \(w^*\) and its local target \(w_b^*\), while the local optima of clients a and c are relatively far away.If we insist on client b’s participation in the federated learning process at this stage, it may lead to a significant deviation in the global gradient update direction compared to the dominant client’s gradient direction, and affect the convergence process. Additionally, some clients may be unwilling to continue contributing computational resources due to being selected too many times.

Fig. 1
figure 1

Simulation results for the network

This article explores a reliable early exit mechanism: (1) setting a lower bound on the number of training rounds for all clients, \(t_{bnd}\), which requires clients to be selected at least \(t_{bnd}\) times before being allowed to exit early; (2) setting an additional number of training rounds for all clients, \(t_{ext}\), which requires clients to be selected at least \(t_{ext}\) times after the model reaches an accuracy target before being allowed to exit early; (3) setting a lower bound on the number of remaining rounds for all clients, \(t_{stay}\), which requires clients to remain in the client pool for at least \(t_{stay}\) rounds after achieving the target accuracy on their own dataset before being allowed to exit early.

Convergence analysis

In order to analyze the convergence performance of FedEem , in combination with previous convergence proof methods for federated learning, the following settings are considered.In each round of global update \(\tau \in T\),where T represents the total number of rounds of global updates, the server selects k clients from the client pool.Each client first receives the global model \(w_{\tau ^k}^k\) from the server, and then performs \(\epsilon\) rounds of training based on its own data. For the j-th round of local training, with a data size of B and a learning rate of \(\eta _l^j\), the local model is optimized using SGD. This can be formulated as \(\varvec{w}_{\tau ^k, j+1}^k=\varvec{w}_{\tau ^k, j}^k-\eta _l^j g\left( \varvec{w}_{\tau ^k, j}^k\right)\), where the gradient \(g\left( \varvec{w}_{\tau ^k, j}^k\right) =\nabla f_k\left( \varvec{w}_{\tau ^k, j}^k, D^k\right)\). Once all selected clients have reported, the server starts the aggregation process. To provide a better explanation, we have made use of some common assumptions and listed all the parameters used in Table 1 [11].

Table 1 Experimental parameters

Assumption 1

The objective function \(f_k\) of each client k is \(L-\)smooth, which means its derivative is \(L-\)Lipschitz continuous, resulting in \(\left\| \nabla f_k(\varvec{w})-\nabla f_k(\varvec{w})\right\| \le L\left\| w-w^{\prime }\right\|\).

Assumption 2

\(E_{\xi }\left[ f_k(\varvec{w}, \xi )\right] =\nabla f_k(\varvec{w})\), where w represents the trainable parameters.

Assumption 3

The expected square norm of the stochastic gradient is uniformly bounded, i.e.,\(\mathbb {E}\left\| \nabla f_k(w, \xi )\right\| ^2 \le G^2\) for \(k=1, \ldots , K\).

Assumption 4

Assuming \(\xi\) is uniformly sampled from the local data of the k-th client device. The variance of the stochastic gradient in each device is bounded, i.e., for \(\mathbb {E}_{\xi }\left\| f_k(\varvec{w}, \xi )-f_k(\varvec{w})\right\| ^2 \le \sigma _k^2\) for \(k=1, \ldots , K\). Then, we define \(\sigma _l^2:=\sum _{k=1}^K \frac{\left| D^k\right| }{|D|} \sigma _k^2\).

Assumption 5

For any client k and parameter w, we define \(\delta _k\) as the upper bound of the local objective function with the global objective function, which is \(\left\| f_k(\varvec{w})-f(\varvec{w})\right\| ^2 \le \delta _k^2\). Furthermore, we define \(\delta _g^2:=\sum _{k=1}^K \frac{\left| D^k\right| }{|D|} \delta _k^2\).

Based on adaptive weight gradient aggregation, Lemma 1 can be obtained.

Lemma 1

Given hyperparameters \(\alpha\) and \(\beta\) for outdatedness and interference discount, the aggregation weight \(p_\tau ^k\) for each gradient has an upper bound \(p_\tau ^k \in \left[ \frac{\alpha }{2} d_k,(\alpha +\beta ) d_k\right]\), where \(d_k=\frac{\left| D^k\right| }{|D|}\).

Lemma 2

\(\mathbb {E}\left[ \left\| g_k\right\| ^2\right] \le 3\left( \sigma _l^2+\sigma _g^2+G\right)\), where the expectation \(\textbf{E}[\cdot ]\) takes into account the random participation of clients and the client’s random gradient.

To simplify the analysis without compromising the proof of convergence, we can disregard the denominator term in Lemma 1. Based on this, we can derive the convergence rate of FedEem as follows:

Theorem 1

Based on Assumptions 1, 2, 3 and 4 and Lemma 1, the convergence rate of FedEem can be expressed as:

$$\begin{aligned} \frac{1}{T} \sum _{\tau =0}^{T-1} \mathbb {E} \left\| \nabla f\left( \varvec{w}_\tau \right) \right\| ^2 \le 2 \frac{\left( f\left( \varvec{w}_0\right) -f\left( \varvec{w}^*\right) \right) }{\phi (E) T K}&+6 K(\alpha +\beta )^2 \lambda (d) L^2 E \psi (E)\left( K^2 \Omega ^2+1\right) \sigma ^2\nonumber \\&+L \frac{\psi (E)}{K \phi (E)}(\alpha +\beta ) \sigma _l^2, \end{aligned}$$

Where \(\lambda (d)=\sum _{i=1}^K d_i^2, \phi (E)=\sum _{j=1}^E \eta _l^j, \psi (E)=\sum _{j=1}^E\left( \eta _l^j\right) ^2\).In addition, to simplify the expression, let \(\sigma ^2=(\alpha +\beta ) \sigma _l^2+(\alpha +\beta ) \delta _g^2+G^2\).In addition, in order to ensure the convergence upper bound, K and \(\eta l\) must satisfy the following relationship:

$$\begin{aligned} \frac{4(\alpha +\beta )}{\alpha ^2 \lambda (d)} K \eta _l^j \le \frac{1}{L}. \end{aligned}$$


The update process of FedEem can be described as: \(\square\)

$$\begin{aligned} w^{t+1} = w^t+\eta _g \bar{\Delta }^t =w^t+\eta _g \frac{1}{K} \sum _{k \in S^t}\left( -\eta _l \sum _{q=1}^{\epsilon _k} g_k\left( y_{k, q}^{t-\tau _k(t)}\right) \right) , \end{aligned}$$

Where \(S^t\) represents selected clients in the t-th global update.

Specifically, unlike previous federated learning proof, due to data heterogeneity and device heterogeneity, it cannot be simply assumed that St is a unified subset because the possibility of clients participating is not the same in different rounds. Specifically, in the early rounds, fast clients are more likely to participate in more rounds due to faster updates, while in the later rounds, the situation is reversed as fast clients drop out early.

Following the common convergence proof procedure used in federated learning methods, the proof of convergence rate for the non-convex objective function proposed in [35] starts by utilizing smoothness Assumption 1. Therefore, it follows that \(:\)

$$\begin{aligned} f\left( w^{t+1}\right) \le f\left( w^t\right) + T_1(t)+T_2(t), \end{aligned}$$
$$\begin{aligned} T_1(t)= -\eta _q \sum _{k \in S_k} p_k^t\left\langle \nabla \tilde{f}\left( w^t\right) , \Delta _k^{t-\tau _k}\right\rangle \end{aligned}$$
$$\begin{aligned} T_2(t)= \frac{L \eta _g^2}{2}\left\| \sum _{k \in S_k} p_k^t \Delta _k^{t-\tau _k}\right\| ^2 \end{aligned}$$

Where \(\Delta _k^{t-\tau _k}\) is the parameter update made by the client after receiving global model parameters before the \(t-\tau _k\) global update. \(\nabla \tilde{f}\left( w^t\right)\) is the global gradient at global update round t. Then, upper bounds are computed for \(T_1\) and \(T_2\)

$$\begin{aligned} T_1(t)&=-\eta _g \sum _{k \in S_t} p_k^t\left\langle \nabla f\left( w^t\right) \cdot \sum _{q=1}^{\epsilon _k} \eta _t^{(q)} g_k\left( y_{k, q}^{t-\tau _k}\right) \right\rangle \nonumber \\&=-\frac{\eta _g}{K} \sum _{k \in S_t} \sum _{q=1}^{\epsilon _k} \eta _t^{(q)} p_k^t\left\langle \nabla f\left( w^t\right) , g_k\left( y_{k, q}^{t-\tau _k}\right) \right\rangle . \end{aligned}$$

By utilizing conditional expectations, it is possible to represent the expectation operator in a more concise manner:

$$\begin{aligned} \mathbb {E}[\cdot ]:=\mathbb {E}_\pi \mathbb {E}_{i \sim \left[ m_t\right] } \mathbb {E}_{g_i, p_i^t \mid i, \pi }[\cdot ], \end{aligned}$$

Where \(\mathbb {E}_\pi\) is the expectation with respect to all client policies,\(\pi =\left\{ \pi _1, \ldots , \pi _N\right\}\) represents the collection of all client policies participating in federated learning, \(\mathbb {E}_{i \sim \left[ m_t\right] }\) is the evaluation over the randomness of selecting client \(i \sim \left[ m_t\right]\) from the distribution at the global round t. Please note that \(m_t\) is not a fixed value due to the presence of dropout mechanism. The inner expectation refers to one-step of random gradient descent on the client. Therefore, under the unbiased estimation assumption, we have

$$\begin{aligned} \mathbb {E}\left[ T_1(t)\right]&=-\mathbb {E}\left[ \frac{\eta _g}{K} \sum _{k \in S_i} \sum _{q=1}^{\epsilon _k} \eta _l p_k^t\left\langle \nabla \tilde{f}\left( w^t\right) , g_k\left( y_{k, q}^{t-\tau _k}\right) \right\rangle \right] \nonumber \\&=-\eta _g \mathbb {E}_\pi \left[ \frac{1}{m} \sum _{i=1}^m \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} \mathbb {E}_{g_i \mid i \sim [m]}\left\langle \nabla \tilde{f}\left( w^t\right) , g_i\left( y_{i, q}^{t-\tau _i}\right) \right\rangle \right] \nonumber \\&=-\frac{\eta _g}{m_t} \mathbb {E}_\pi \left[ \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _k} \eta _l\left\langle \nabla \tilde{f}\left( w^t\right) , p_i^t \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\rangle \right] \nonumber \\&=-\eta _g \bar{\eta }_l \mathbb {E}_\pi \left[ \left\langle \nabla \tilde{f}\left( w^t\right) ,\frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _k} p_i^t \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\rangle \right] . \end{aligned}$$

Furthermore,\(\mathbb {E}\left[ T_1(t)\right]\) can be written as:

$$\begin{aligned} \mathbb {E}\left[ T_1(t)\right] =-\frac{\eta _g \overline{\eta _l}}{2} \mathbb {E}\left( \left\| \nabla \tilde{f}\left( w^t\right) \right\| ^2\right)&+\frac{\eta _g \bar{\eta }_l}{2}\left( -\mathbb {E}_\pi \left\| \frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} p_i^k \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2\right. \nonumber \\&+\mathbb {E}_\pi \underbrace{\left\| \nabla \tilde{f}\left( w^t\right) -\frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} p_i^k \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2}_{T_3(t)}. \end{aligned}$$

For \(T_3\), it can be derived from the definition of \(f\left( w_t\right)\)

$$\begin{aligned} \mathbb {E}_\pi \left[ T_3(t)\right]&=\mathbb {E}_\pi \left\| \frac{1}{m_t} \sum _{i=1}^{m_t} \nabla F_i\left( w^t\right) -\frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} p_i^k \nabla F_i\left( y_{i, q}^{t-r_i}\right) \right\| ^2\nonumber \\&\le \frac{1}{m_t} \sum _{i=1}^{m_t} \mathbb {E}_\pi \left\| \sum _{q=1}^{\epsilon _i} p_i^k\left[ \nabla F_i\left( w^t\right) -\nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right] \right\| ^2\nonumber \\&\le \frac{1}{m_t} \sum _{i=1}^{m_t} \mathbb {E}_\pi \sum _{q=1}^{\epsilon _i} p_i^k\left\| \left[ \nabla F_i\left( w^t\right) -\nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right] \right\| ^2. \end{aligned}$$

By defining \(\gamma _i(t)=\sum _{q=1}^{\epsilon _i} p_i^k\), further, \(T_3\) can be expressed as the error caused by obsolescence and local drift.

$$\begin{aligned} \mathbb {E}\left[ T_3(t)\right]&\le \frac{2}{m} \sum _{i=1}^m \mathbb {E}_\pi \gamma _i(t)(O_e+C_d)\nonumber \\&\le \frac{2}{m} \sum _{i=1}^m\left( L^2 \mathbb {E}_\pi \gamma _i(t)\left\| w^t-w^{t-\tau _i}\right\| ^2+L^2 \mathbb {E}_\pi \gamma _i(t)\left\| w^{t-\tau _i}-y_{i, q}^{t-\tau _i}\right\| ^2\right) . \end{aligned}$$
$$\begin{aligned} O_e=\left\| \nabla F_i\left( w^t\right) -\nabla F_i\left( w^{t-\tau _i}\right) \right\| ^2 \end{aligned}$$
$$\begin{aligned} C_d=\left\| \nabla F_i\left( w^{t-\tau _i}\right) -\nabla F_i\left( y_{t, q}^{t-\tau _i}\right) \right\| ^2 \end{aligned}$$

Where,\(O_e\) is the Obsolescence error, and \(C_d\) is the client drift.

The errors caused by obsolescence can be mitigated by accumulating them as model updates between rounds

$$\begin{aligned} \left\| w^t-w^{t-\tau _i}\right\| ^2&=\left\| \sum _{\rho =t-\tau _i}^{t-1}\left( w^{\rho +1}-w^\rho \right) \right\| ^2\nonumber \\&=\left\| \sum _{\rho =t-\tau _i}^{t-1} \frac{\eta _g}{K} \sum _{j_\rho \in S_\rho } \Delta _{j_\rho }^\rho \right\| ^2\nonumber \\&=\frac{\eta _g^2}{K^2}\left\| \sum _{\rho =t-\tau _i}^{t-1} \sum _{j_\rho \in S_\rho } \sum _{l=0}^{Q-1} \eta _l^{(l)} g_{j_\rho }\left( y_{j_\rho , l}^\rho \right) \right\| ^2. \end{aligned}$$

The upper bound for computing its expectation can be obtained

$$\begin{aligned} \gamma _i(t) \mathbb {E}_\pi \left\| w^t-w^{t-\tau _i}\right\| ^2&\le \frac{\eta _g^2 \tau _i}{K} \mathbb {E}_\pi \left( \gamma _i(t) \epsilon _i\right) \sum _{\rho =t-\tau _i} \sum _{j_\rho \in S_\rho } \sum _{l=0}^\epsilon \left( \eta _l^{(l)}\right) ^2 \mathbb {E}\left\| g_{j_\rho }\left( y_{j_\rho , l}^p\right) \right\| ^2\nonumber \\&\le 3 \eta _g^2 \mathbb {E}_\pi \left( \gamma _i(t) \epsilon _i\right) \max _{\tau _i} \tau _i^2\left( \sum _{l=1}^{\epsilon _l}\left( \eta _l^{(l)}\right) ^2\right) \left( \sigma _l^2+\sigma _g^2+G\right) \nonumber \\&\le 3 \eta _g^2 \eta _l^2 \mathbb {E}_\pi \left( \gamma _i(t) \epsilon ^2\right) \tau _{\max , K}^2\left( \sigma _l^2+\sigma _g^2+G\right) , \end{aligned}$$

The second inequality utilizes Lemma 2 for bounding, and the last inequality utilizes \(\mathbb {E}(X)^2 \le \mathbb {E}\left( x^2\right)\). The expected error caused by local drift can be similarly constrained as:

$$\begin{aligned} \mathbb {E}\left[ T_3\right]&\le 6\left( L^2 \eta _g^2 \eta _l^2 \mathbb {E}_\pi \left( \gamma _i(t) \epsilon ^2\right) \tau _{\max , K}^2\left( \sigma _l^2+\sigma _g^2+G\right) +L^2 q\left( \sum _{i=0}^{g-1}\left( \eta _l^{(i)}\right) ^2\right) \left( \sigma _l^2+\sigma _g^2+G\right) \right) \nonumber \\&\le 6 L^2 \mathbb {E}_\pi \left( \gamma _i(t) \epsilon ^2\right) \left( \eta _g^2 \tau _{\max , K}^2+\frac{1}{2}\right) \left( \sigma _l^2+\sigma _g^2+G\right) , \end{aligned}$$

Substituting the constraint of \(T_3\) back into \(T_1\) yields:

$$\begin{aligned} \mathbb {E}\left[ T_1\right] \le -\frac{\eta _g \eta _l}{2}\left\| \nabla f\left( w^t\right) \right\| ^2 +\frac{\eta _g \eta _l}{2} \mathbb {E}\left[ T_3\right] -\mathbb {E}_\pi \left\| \frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} p_i^k \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2. \end{aligned}$$

Let \(\beta (Q):=\sum _{q=0}^{Q-1}\left( \eta _{\ell }^{(q)}\right) ^2\). Therefore, we have:

$$\begin{aligned} \mathbb {E}\left[ T_t\right]&\le -\frac{\eta _g \eta _l}{2}\left\| \nabla f\left( w^t\right) \right\| ^2 +3 L^2 \mathbb {E}_\pi \left( \gamma _i(t) \epsilon ^2\right) \left( \eta _g^2 \tau _{\max , K}^2+\frac{1}{2}\right) \left( \sigma _l^2+\sigma _g^2+G\right) \nonumber \\&\quad-\underbrace{\mathbb {E}_\pi \left\| \frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} p_i^k \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2}_{T_4} . \end{aligned}$$

For the constraint on the expected value of \(T_2\), we have:

$$\begin{aligned} \mathbb {E}\left[ T_2(t)\right]&=\textrm{E}\left[ \frac{L \eta _g^2}{2 K^2}\left\| \sum _{k \in S_t} \sum _{q=0}^{\epsilon _k} \eta _{\ell }^{(q)} p_k^t g_k\left( y_{k, q}^{t-\tau _k}\right) \right\| ^2\right] \nonumber \\&=\mathbb {E}\left[ \frac{L \eta _g^2}{2 K^2}\left\| \sum _{k \in S_k} \sum _{q=1}^{\epsilon _k} \eta _k^{(q)} p_k^t\left( g_k\left( y_{k, q}^{t-\tau _k}\right) -\nabla F_k\left( y_{k, q}^{t-\tau _k}\right) \right) +\sum _{k \in S_k} \sum _{q=1}^{\epsilon _k} \eta _k^{(q)} p_k^t \nabla F_k\left( y_{k, q}^{t-\tau _k}\right) \right\| ^2\right] \nonumber \\&=\frac{L \eta _q^2}{2 K^2} \mathbb {E}\left\| \sum _{k \in S,} \sum _{q=1}^{\epsilon _k} \eta _{\ell }^{(q)} p_k^t\left( g_k\left( y_{k, q}^{t-r_k}\right) -\nabla F_k\left( y_{k, q}^{t-r_k}\right) \right) \right\| ^2+\frac{L \eta _g^2}{2 K^2} \mathbb {E}\left\| \sum _{k \in S} \sum _{q=1}^{\epsilon _k} \eta _{\ell }^{(q)} p_k^t \nabla F_k\left( y_{k, q}^{t-r_k}\right) \right\| ^{2}\nonumber \\&=\frac{L \eta _g^2}{2} \sum _{k \in S} \sum _{q=1}^{\epsilon _k}\left( \eta _{\ell }^{(q)} p_k^t\right) ^2 \mathbb {E}\left\| \left( q_k\left( y_{k, q}^{t-\tau _k}\right) -\nabla F_k\left( y_{k, q}^{t-\tau _k}\right) \right) \right\| ^{2} \nonumber \\&\quad +\frac{L \eta _g^2}{2 K^2} \mathbb {E}_\pi \bar{\epsilon } \mathbb {E} \sum _{k \in S, q=1}^{\epsilon _k}\left\| \eta _{\ell }^{(q)} p_k^t \nabla F_k\left( y_{k, q}^{t-\tau _k}\right) \right\| ^{2} \nonumber \\&\le \frac{L \eta _g^2 \eta _l^2 \zeta (t) \sigma _{\ell }^2}{2}+\frac{L \eta _g^2 \mathbb {E}_\pi \bar{\epsilon }}{2 K} \sum _{k \in S_k}\left( \eta _{\ell }^{(q)}\right) ^2 \mathbb {E}_\pi \mathbb {E}_{k \sim \left[ m_t\right] \pi } p_k^t \Vert \sum _{q=1}^{\epsilon _k} \nabla F_k\left( y_{k, q}^{\left. t-\tau _k\right) } \Vert ^2\right. \nonumber \\&=\frac{L \eta _g^2 \eta _l^2 \zeta (t) \sigma _{\ell }^2}{2}+\underbrace{\frac{L \eta _g^2 \mathbb {E}_\pi \bar{\epsilon }}{2 K} \sum _{k \in S_i}\left( \eta _{\ell }^{(q)}\right) ^2 \mathbb {E}_\pi \left[ \frac{p_i^t}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _k}\left\| \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2\right] } T_5 , \end{aligned}$$

Where, the definition \(\zeta (t)=\mathbb {E}_\pi \sum _{q=1}^{\epsilon _k} p_k^t\) is given. In order to ensure that an upper bound exists on \(\mathbb {E}\left[ T_1+T_2\right]\), it is necessary to ensure that \(T_4+T_5 \le 0\):

$$\begin{aligned} \left( T_4+T_5\right)&=-\mathbb {E}_\pi \left\| \frac{1}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} p_i^k \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2+\frac{L \eta _g^2 \mathbb {E}_\pi \bar{\epsilon }}{2 K} \sum _{k \in S_i}\left( \eta _{\ell }^{(q)}\right) ^2 \mathbb {E}_\pi \left[ \frac{p_i^t}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _k}\left\| \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2\right] \nonumber \\&\le -\mathbb {E}_\pi \left\| \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} \frac{p_i^k}{m_t} \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2+\frac{L \eta _g^2 \mathbb {E}_\pi \bar{\epsilon }}{2 K} \sum _{k \in S_i}\left( \eta _{\ell }^{(q)}\right) ^2 \mathbb {E}_\pi \left[ \frac{p_i^t}{m_t} \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _k}\left\| \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2\right] \nonumber \\&=\left( -\eta _g-L \mathbb {E}_\pi \bar{\epsilon } \eta _g^2 \eta _l^2\right) \mathbb {E}_\pi \left\| \sum _{i=1}^{m_t} \sum _{q=1}^{\epsilon _i} \eta _l^{(q)} \frac{p_i^k}{m_t} \nabla F_i\left( y_{i, q}^{t-\tau _i}\right) \right\| ^2. \end{aligned}$$

Therefore, in order to ensure that \(T_4+T_5 \le 0\), it is required that for all local gradient descent steps, \(\eta _g \eta _{\ell } \mathbb {E}_\pi \bar{\epsilon } \le \frac{1}{L}\). Finally, combiningT1, T2 provides the expected improvement in performance between two adjacent global models:

$$\begin{aligned} \mathbb {E}\left[ f\left( w^{t+1}\right) \right]&\le \mathbb {E}\left[ f\left( w^t\right) \right] -\frac{\eta _g \gamma (t}{2}\left\| \nabla f\left( w^t\right) \right\| ^2\nonumber \\&\quad +3 \phi _g L^2 Q \gamma (t) \zeta (t)\left( \eta _g^2 \pi _{\max , K^2}+\frac{1}{2}\right) \left( \sigma _l^2+\sigma _g^2+G\right) +\frac{L}{2} \eta _g^2 \zeta (t) \sigma _l^2 \end{aligned}$$

After nested summation from \(t=1, \cdots , T\) , the above equation can be obtained.

$$\begin{aligned}&\sum _{t=0}^{T-1} \eta _g \gamma (t)\left\| \nabla f\left( w^t\right) \right\| ^2\nonumber \\&\quad \le \sum _{t=0}^{T-1} 2\left( \mathbb {E}\left[ f\left( w^t\right) \right] -\mathbb {E}\left[ f\left( w^{t+1}\right) \right] \right) +3 \sum _{t=0}^{T-1} \eta _g L^2 \mathbb {E}_\pi \bar{\epsilon } \gamma (t) \zeta (t)\left( \eta _g^2 \tau _{\max , K}^2+1\right) \left( \sigma _t^2+\sigma _g^2+G\right) +\frac{L}{2} \eta _g^2 \zeta (t) \sigma _l^2\nonumber \\ &\quad\le 2\left( f\left( w^0\right) -f\left( w^n\right) \right) +3 \sum _{s=0}^{T-1} \eta _g L^2 \gamma (t) \zeta (t)\left( \eta _g^2 \tau _{\max , K}^2+1 / 2\right) \left( \sigma _l^2+\sigma _g^2+G\right) +\frac{L}{2} \eta _g^2 \zeta (t) \sigma _{\ell }^2 \end{aligned}$$

Therefore, Theorem 1 can be obtained.

$$\begin{aligned} \frac{1}{T} \sum _{t=0}^{T-1}\left\| \nabla f\left( w^t\right) \right\| ^2&\le \frac{2\left( f\left( w^0\right) -f\left( w^*\right) \right) }{\eta _g \cdot \alpha (Q) T}\nonumber \\&\quad+3 L^2 Q \beta (Q)\left( \eta _g^2 \tau _{\max , K}^2+1\right) \left( \sigma _l^2+\sigma _g^2+G\right) +\frac{L}{2} \frac{\eta _g \beta (Q)}{\alpha (Q)} \sigma _l^2 \end{aligned}$$


Lemma 2\(\square\)

$$\begin{aligned} \left\| \textbf{w}_{t+1}-\textbf{w}^{\star }\right\| ^2&=\Vert \textbf{w}_t+\eta _g \sum _{k \in S^t} p_k\left( -\eta _{\ell } \sum _{q=1}^Q g_k\left( y_{k, q}^{t-\tau _k(t)}\right) \right) \nonumber \\&=\Vert \textbf{w}_t-\left. \textbf{w}^{\star }\right| ^2+T_1+T_2 \end{aligned}$$
$$\begin{aligned} T_1=2 \eta _t(\overline{\textbf{w}}_t-\textbf{w}^{\star },+\eta _g \sum _{k \in S^t} p_k\left( -\eta _{\ell } \sum _{q=1}^Q g_k\left( y_{k, q}^{t-\tau _k(t)}\right) \right) \end{aligned}$$
$$\begin{aligned} T_2=\eta _g^2\left\| \sum _{k \in S^t} p_k\left( -\eta _{\ell } \sum _{q=1}^Q g_k\left( y_{k, q}^{t-\tau _k(t)}\right) \right) \right\| ^2 \end{aligned}$$

Experiment and analysis

To understand the impact of various hyperparameters in the convergence process and computational resource consumption in federated learning, this study conducted an analysis from two perspectives: the influence of hyperparameters on the performance of FedEem and the computational speed. By controlling variables and conducting a series of comparative experiments, we demonstrated the efficiency and fairness of FedEem .

Experimental setup

To ensure fairness in the experimental comparison, this paper primarily focuses on comparing the number of rounds required for the global model to achieve a specific accuracy threshold (e.g., 95\(\%\) accuracy on the MNIST dataset). The experiments involve a fixed total of 20 clients. The federated learning process is simulated using FLsim, a simulator specifically designed for experimental research [36]. FLsim utilizes JSON files to manage the configuration parameters of federated learning simulations and provides a general template along with three pre-configured simulation files for the CIFAR-10, FashionMNIST, and MNIST datasets. In this study, we implemented federated learning algorithms such as FedBuff for conducting comparative experiments.

All simulation experiments were performed on a PC server running Ubuntu Linux 21.1.0. The server is equipped with an Intel i5-10600KF (4.10GHz) processor, 64GB RAM, and 4 NVIDIA TITAN-V GPUs. The experimental environment utilizes Python 3.9.5 and PyTorch 1.8.1.

Analysis of experimental results

Figures 2 and 3 show the performance comparison of FedEem with other state-of-the-art algorithms under the scenarios of uniform and randomly independent distributions. Due to the presence of the early exit mechanism, FedEem has a significant advantage in terms of convergence speed compared to other asynchronous federated learning algorithms. In addition, the aggregation mechanism optimized by FedEem allows for a more stable convergence process, as it is less affected by the obsolescence of model updates and interference caused by large gradient differences.

Fig. 2
figure 2

Concurrency level is 10, with each client having 120 data samples. The data is uniformly distributed with a Non-IID pattern

Fig. 3
figure 3

The concurrent number is 10, and each client has 120 data, with data randomly distributed in a Non-IID manner

Figure 4 illustrates the convergence process of FedEem under different choices of regularization weights, with 30 clients and four repetitions of experiments. It can be observed that different hyperparameter choices have significant differences in terms of time and round consumption, but are not consistent in terms of variance. Therefore, making intelligent decisions regarding hyperparameters in the asynchronous federated learning process is necessary.

Fig. 4
figure 4

Performance comparison of asynchronous federated learning under different numbers of clients


This paper investigates an optimized mechanism for asynchronous federated learning in the context of edge computing scenarios. Firstly, the necessity of the asynchronous mechanism in highly heterogeneous federated learning is analyzed. The paper also addresses the fairness issues in previous asynchronous federated learning algorithms and proposes an optimized mechanism called FedEem . This mechanism includes a weight aggregation mechanism that incorporates timeliness and fairness considerations, as well as an early exit mechanism. Experimental results demonstrate that the proposed algorithm achieves significant improvements in both convergence time and fairness under various data distributions and device heterogeneity.

Availability of data and materials

All code used to support this work is available from the authors upon request.


  1. Gong B, Xing T, Liu Z, Wang J, Liu X (2022) Adaptive clustered federated learning for heterogeneous data in edge computing. Mob Netw Appl 27(4):1520–1530

    Article  Google Scholar 

  2. Xu X, Li H, Li Z, Zhou X (2022) Safe: Synergic data filtering for federated learning in cloud-edge computing. IEEE Trans Ind Inform 19(2):1655–1665

    Article  Google Scholar 

  3. Wu S, Shen S, Xu X, Chen Y, Zhou X, Liu D, Xue X, Qi L (2022) Popularity-aware and diverse web apis recommendation based on correlation graph. IEEE Trans Comput Soc Syst 10(2):771–782

    Article  Google Scholar 

  4. Wang F, Zhu H, Srivastava G, Li S, Khosravi MR, Qi L (2021) Robust collaborative filtering recommendation with user-item-trust records. IEEE Trans Comput Soc Syst 9(4):986–996

    Article  Google Scholar 

  5. Liang F, Yang Q, Liu R, Wang J, Sato K, Guo J (2022) Semi-synchronous federated learning protocol with dynamic aggregation in internet of vehicles. IEEE Trans Veh Technol 71(5):4677–4691

    Article  Google Scholar 

  6. You L, Liu S, Chang Y, Yuen C (2022) A triple-step asynchronous federated learning mechanism for client activation, interaction optimization, and aggregation enhancement. IEEE Internet Things J 9(23):24199–24211

    Article  Google Scholar 

  7. Hu CH, Chen Z, Larsson EG (2023) Scheduling and aggregation design for asynchronous federated learning over wireless networks. IEEE J Sel Areas Commun 41(4):874–886

    Article  Google Scholar 

  8. Liu Y, Zhou X, Kou H, Zhao Y, Xu X, Zhang X, Qi L (2023) Privacy-preserving point-of-interest recommendation based on simplified graph convolutional network for geological traveling. ACM Trans Intell Syst Technol.

  9. Hosseini SM, Sikaroudi M, Babaie M, Tizhoosh H (2023) Proportionally fair hospital collaborations in federated learning of histopathology images. IEEE Trans Med Imaging 42(7):1982–1995

    Article  Google Scholar 

  10. Chen Y, Ning Y, Slawski M, Rangwala H (2020) Asynchronous online federated learning for edge devices with non-iid data. In: 2020 IEEE International Conference on Big Data (Big Data). IEEE, Piscataway, p 15–24

  11. Nguyen J, Malik K, Zhan H, Yousefpour A, Rabbat M, Malek M, Huba D (2022) Federated learning with buffered asynchronous aggregation. In: International Conference on Artificial Intelligence and Statistics. PMLR, NY, p 3581–3607

  12. Su N, Li B (2022) How asynchronous can federated learning be? In: 2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS). IEEE, Piscataway, p 1–11

  13. Tan AZ, Yu H, Cui L, Yang Q (2022) Towards personalized federated learning. IEEE Trans Neural Netw Learn Syst.

  14. Li Q, Diao Y, Chen Q, He B (2022) Federated learning on non-iid data silos: An experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, Piscataway, p 965–978

  15. Qi L, Lin W, Zhang X, Dou W, Xu X, Chen J (2022) A correlation graph based approach for personalized and compatible web apis recommendation in mobile app development. IEEE Trans Knowl Data Eng 35(6):5444–5457

    Google Scholar 

  16. Chai Z, Ali A, Zawad S, Truex S, Anwar A, Baracaldo N, Zhou Y, Ludwig H, Yan F, Cheng Y (2020) Tifl: A tier-based federated learning system. In: Proceedings of the 29th international symposium on high-performance parallel and distributed computing. ACM, New York, p 125–136

  17. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450

    Google Scholar 

  18. Jiang Y, Konečnỳ J, Rush K, Kannan S (2019) Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488.

  19. Yang H, He H, Zhang W, Cao X (2020) Fedsteg: A federated transfer learning framework for secure image steganalysis. IEEE Trans Netw Sci Eng 8(2):1084–1094

    Article  Google Scholar 

  20. Duan M, Liu D, Ji X, Liu R, Liang L, Chen X, Tan Y (2021) Fedgroup: Efficient federated learning via decomposed similarity-based clustering. In: 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, Piscataway, p 228–237

  21. Huang Y, Chu L, Zhou Z, Wang L, Liu J, Pei J, Zhang Y (2021) Personalized cross-silo federated learning on non-iid data. In: Proceedings of the AAAI conference on artificial intelligence. Menlo Park, 35:7865–7873

  22. Li D, Wang J (2019) Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581.

  23. Zhou Z, Chu L, Liu C, Wang L, Pei J, Zhang Y (2021) Towards fair federated learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, New York, p 4100–4101

  24. Lyu L, Xu X, Wang Q, Yu H (2020) Collaborative fairness in federated learning. Federated Learn Priv Incent 12500:189–204

  25. Lim WYB, Xiong Z, Miao C, Niyato D, Yang Q, Leung C, Poor HV (2020) Hierarchical incentive mechanism design for federated machine learning in mobile networks. IEEE Internet Things J 7(10):9575–9588

    Article  Google Scholar 

  26. Zhan Y, Li P, Wang K, Guo S, Xia Y (2020) Big data analytics by crowdlearning: Architecture and mechanism design. IEEE Netw 34(3):143–147

    Article  Google Scholar 

  27. Zhan Y, Zhang J, Li P, Xia Y (2019) Crowdtraining: Architecture and incentive mechanism for deep learning training in the internet of things. IEEE Netw 33(5):89–95

    Article  Google Scholar 

  28. Du W, Xu D, Wu X, Tong H (2021) Fairness-aware agnostic federated learning. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, p 181–189

  29. Liang PP, Liu T, Ziyin L, Allen NB, Auerbach RP, Brent D, Salakhutdinov R, Morency LP (2020) Think locally, act globally: Federated learning with local and global representations. arXiv preprint arXiv:2001.01523.

  30. Xu X, Tang S, Qi L, Zhou X, Dai F, Dou W (2023) Cnn partitioning and offloading for vehicular edge networks in web3. IEEE Commun Mag 61(8):36–42

    Article  Google Scholar 

  31. Zhou X, Bilal M, Dou R, Rodrigues JJ, Zhao Q, Dai J, Xu X (2023) Edge computation offloading with content caching in 6g-enabled iov. IEEE Trans Intell Transp Syst.

  32. Xu X, Yang C, Bilal M, Li W, Wang H (2022) Computation offloading for energy and delay trade-offs with traffic flow prediction in edge computing-enabled iov. IEEE Trans Intell Transp Syst.

  33. Wu J, Zhang J, Zhang Y, Wen Y (2023) Constraint-aware and multi-objective optimization for micro-service composition in mobile edge computing. Softw Pract Exp.

  34. Qi L, Xu X, Wu X, Ni Q, Yuan Y, Zhang X (2023) Digital-twin-enabled 6g mobile network video streaming using mobile crowdsourcing. IEEE J Sel Areas Commun.

  35. Zhan Y, Zhang J, Hong Z, Wu L, Li P, Guo S (2021) A survey of incentive mechanism design for federated learning. IEEE Trans Emerg Top Comput 10(2):1035–1044

    Google Scholar 

  36. Wang H, Kaplan Z, Niu D, Li B (2020) Optimizing federated learning on non-iid data with reinforcement learning. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, Piscataway, p 1698–1707

Download references


We would like to express our sincere gratitude to the editors and reviewers for their invaluable feedback and comments on this paper.


Not applicable.

Author information

Authors and Affiliations



Author Contributions Statement: Each author has made significant contributions to the research and preparation of this manuscript. [G.] conceived the research idea, designed the experiments, and conducted the data analysis. Additionally, [G.] contributed to the literature review, data collection, and manuscript writing. [Z.] provided technical guidance, reviewed and revised the manuscript. [Z.] also contributed to the experimental design, conducted the experiments, and analyzed the results. Furthermore, [Z.] provided critical feedback and contributed to the interpretation of the findings. All authors have read and approved the final version of the manuscript and take full responsibility for its content.

Corresponding author

Correspondence to Yifan Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, W., Zhang, Y. FedEem: a fairness-based asynchronous federated learning mechanism. J Cloud Comp 12, 154 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: