A split-federated learning and edge-cloud based efficient and privacy-preserving large-scale item recommendation model

The combination of federated learning and recommender system aims to solve the privacy problems of recommendation through keeping user data locally at the client device during the model training session. However, most existing approaches rely on user devices to fully compute the deep model designed for the large-scale item recommendation; therefore, imposing high calculation and communication overheads on resource-constrained user devices. Consequently, achieving efficient federated recommendations across ubiquitous mobile devices remains an open research problem. To this end, in this paper we propose an efficient and privacy-preserving federated learning framework which is based on the cloud-edge collaboration for large-scale item recommendation called SpFedRec. In our method, to reduce the computation and communication cost of the federated two-tower model, a split learning approach is applied to migrate the item model from participants’ edge devices to the computationally powerful cloud side and compress item data while transmitting. Meanwhile, to enhance the feature representation, the Squeeze-and-Excitation network mechanism is used on the backbone model to optimize the perception of dominant features. Moreover, because the gradients transmitted contain private information about the user; therefore, we propose a multi-party circular secret-sharing chain based on secret sharing for better privacy protection. Extensive experiments using plausible assumptions on two real-world datasets demonstrate that our proposed method improves the average computation time and communication cost by 23% and 49%, respectively. Furthermore, the proposed model accomplishes comparable performance with other state-of-art federated recommendation models.


INTRODUCTION
Recently, privacy-preserved recommendation have become a research hotspot due to growing concerns about user privacy [24], data security [4], and strict government regulations such as GDPR1 and CPRA2 .Federated Learning (FL) [23] is considered one of the effective privacy-preserving machine learning paradigms [18].Participants of FL collaborate to train a model under the coordination of a central server while keeping the training data decentralized.FL can mitigate the systematic privacy risks of traditional centralized machine learning, which fits the need for data security and privacy protection in recommendation systems (RS) [39].Ammad-ud-din et al. [2] proposed the Federated Collaborative Filtering (FCF), the FCF validated that a collaborative filter can be federated without a loss of accuracy compared to centralized implementation.Following the work of FCF, Flanagan et al. [10] proposed the Federated Multi-view Matrix Factorization (FED-MVMF).FED-MVMF leverages multiple user data sources to facilitate model training, enhance model accuracy, and handle cold start problems.For specific application scenarios, Chen et al. [5] proposed PriRec for Federated Point-of-Interest recommendation.Qi et al. [27] proposed FedNewsRec for Federated news recommendation.And Zhou et al. [42] proposed a federated social recommendation framework with Big Data Support.Most existing works assume all user devices are available and adequate for participant FL training.
However, in practice, FL based recommendation still faces the following challenges.On the one hand, modern recommendation systems are designed for accurate prediction and constructed with large model sizes [1], making resource-limited mobile devices inefficient for local model training [18] [3].Many devices cannot meet the performance requirements of the FL, resulting in poor model accuracy or even training failure.On the other hand, industrialgrade recommendations have massive item data, connecting massive users with items from huge volumes, often in the millions to billions [37].The centralized recommendation can mitigate this part of the work under strict latency requirements with the recall mechanism.However, since the server cannot access the user's data, massive raw item data must be delivered to the mobile device, causing considerable communication and computation overhead.In summary, a significant challenge for FL based recommendations to be applied in real-world environments is, How to perform efficient and privacy-preserving FL based recommendations across the resource-limited ubiquitous mobile device.
Specifically, we focus on two compression objectives of practical value for FL based RS.(a) Server broadcast compression [20] to reduces communication costs when devices are involved in training and online prediction.(b) Local computation reduction [21] [36] to reduces the computational workload of the device.
Choices of recommendation model.The two-tower model is widely used in traditional recommendation systems to recall massive items [38] [34].A unique feature of the two-tower model is that feature modeling for users and items is two independent sub-networks and can be cached separately [17].Only the similarity matching of user and item embeddings is needed when online prediction is performed.Therefore, it is feasible to split the two independent sub-networks, and their training and online inferring can be performed within different entities, which matches our compression objectives.
Choices of privacy-enhancing techniques.Currently, many privacy-enhancing techniques have been used in FL since the local gradients may leak private information [18][3], such as Differential Privacy (DP) [7], Secure Multiparty Computation (SMPC) [22], and Homomorphic Encryption (HE) [9].Since DP introduces random noises to the gradients that affect model accuracy and HE will cause too much extra computation [6][26], we choose SMPC, which can provide security guarantees on multi parties' gradient aggregation.
Dealing with computation and communication overhead.Although some methods have been proposed to improve the efficiency of FL based RS, they do not explicitly address the above issues.Muhammad et al. [25] proposed FedFast to accelerate the training efficiency by building user clusters and allowing parameter sharing.Khan et al. [19] introduce the multi-arm bandit mechanism to tackle the item-dependent payloads problem.However, these two methods cannot reduce the device's workload during the training phase.Qin et al. [28] proposed PPRSF that enables the server to use a portion of the users' explicit feedback to build a recall model, resulting privacy exposure.Different from their work, in this paper, we proposed a split learning based workload compression method that effectively compresses the device computation and communication load under privacy protection.
This paper proposes a Split Two-Tower Federated Recommendation framework (STTFedRec).In our proposed approach, based on the unique structure of the two-tower model, we use split learning to separate the user model (user tower) and item model (item tower) and place them on the user device and server, respectively.The user device is only responsible for the computation of the user tower, achieving the goal of model compression.For online prediction, instead of complete raw item data, the server will provide the user device with low-dimensional item embeddings to achieve the goal of communication compression.Finally, the user device only needs to perform the similarity computing between the user and item embeddings for item ranking, retains the end-to-end computing benefits of a centralized recommendation system, and achieves the goal of computation efficiency optimization.
Summary of experimental results.We conduct experiments on two public benchmark dataset, i.e., MovieLens1M and BookCrossings.The results demonstrate that (1) STTFedRec achieves the same performance as existing FL based recommendation models, Split learning did not affect the performance of the FL two-tower model.
(2) STTFedRec improves the average computation time and communication size of the baseline models by about 40× and 42× in the best case scenario.(3) The proposed multi-party circular secret sharing chain does not put additional strain on the computing and communication of user devices, and its communication volume is not affected by the number of participating members.
Contributions.We summarize our main contributions below: (1) We propose STTFedRec, a privacy-preservsed and efficient crossdevice recommendation framework, which can significantly reducing the computation and communication overhead on resourceconstrained mobile devices.(2) We propsoe an obfuscated item request strategy and multi-party circular secret sharing chain to further enhance privacy during the STTFedRec training phase.(3) We conduct thorough experiments on real-world datasets to verify the effectiveness and efficiency of STTFedRec.

RECOMMENDATION MODEL AND TOOLS
This section present the selected recommendation model and related techniques to STTFedRec.

The Two-tower Model
The two-tower model originated from the Deep Semantic Similarity Model (DSSM) [17].Utilizing two-tower models have become a popular approach in several natural language tasks and have proven to be outstandingly efficient in building large-scale semantic matching systems.Since semantic matching itself is a ranking problem that coincides with the recommendation scenario, the two-tower model was naturally introduced into the recommendation domain and is widely used in advertising (Facebook) [15], information retrieval (Google) [35], and recommendation systems (Youtube) [40].
Formally, utilize the two-tower model to build a recommendation system, the goal is to retrieve a subset of items for a given user for subsequent ranking.Denoting users' feature vector as {  }  =1 , items' feature vector as {  }  =1 .Build two embedding functions,  :  ×   →   ,  :  ×   →   , mapping user and candidate item to a -dimensional vector space.The output of the model is the inner product of the two embeddings by Eq.( 1).The training object is to learn the parameters   and   based on the training set D := {(  ,   ,   )}  =1 , where the   is user feedback, indicates whether the user clicked, watched or rated an item.(, ) =<  (,   ),  (,   ) > (1) The item retrieval problem can be considered as a multiclassification problem, given a user  , the probability of getting  from  items can be calculated using the softmax function as Eq.( 2).The log-likelihood loss funcion as Eq.(3).
The unique feature of the two-tower model is that the structure of the user and item representation are two independent sub-networks, as shown in Figure 2(a).The two independent towers are cached separately, and only similarity operations are performed in memory during online prediction.It is worth noting that this enables the training and computation process of these two independent subnetworks to be placed separately on the user and server sides in a FL setting.Based on these properties and combined with our optimization objectives, we choose the DSSM model as the basic model of STTFedRec.

Secure Multiparty Computing (SMPC)
SMPC is a cryptographic tool which enables multiple participants to properly perform distributed computing tasks in a secure manner [22].Fomally, denote  participants {P  }  ∈ [1,] , wish to jointly compute the  element function  ( 1 ,  2 , ...,   ) using their respective secret inputs   , and gets output   .SMPC protocols ensure, at the end of the protocol, each participant does not receive any additional information beyond its own inputs and outputs and the information that can be derived from them.
Here, we present the Addition SMPC protocols which we will use later in our framework for secure gradient aggregation.Take two secret shares Shr() as inputs from two parties, P i ∈ {0,1} Locally calculate and return With Eq.( 4) and Eq.( 5), members participating in FL can hide their actual gradient and thus maintain privacy, while the server can achieve secure gradients aggregation.

Split Learning
Split learning [12] is a collaborative deep learning technique.In contrast to federal learning settings that focus on data partitioning and communication patterns, the critical idea behind split learning is to divide the execution of the model by layer between the client and the server for both the training and inference processes of the model [31].The U-shaped split learning is privacy preserved variation since the raw data and labels are not transferred between the client and the server [33].Formally, a client holds a deep learning task with  computation layers   ,  ∈ [1, ].The client computes the forward propagation until it reaches the cut layer   and  < .The output of   is then send to a computationally powerful server, which completes the forward propagation computation of  +1 →   .The output the   will send back to the client and compute the loss function with labels.This completes a round of forward propagation.The gradients can then be backpropagated from the last layer until the cut layer in a reverse path.The cut layer's gradients will be sent back to the client, where the rest of the backpropagation is completed.The above process continues until the model converges.This setup is shown in Figure 1.
Based on split learning, we can extract part of the computation module of a recommendation model from resource-constrained mobile devices, compressing the computation overhead and offloading it to a server with superior performance.To the best of our knowledge, we propose the first FL based RS framework that uses split learning to reduce the computation and communication overhead of user devices.

STTFEDREC FRAMEWORK
This section describes the proposed STTFedRec Framework in detail.We first present the problem definition, and then present the overall framework, and training and online prediction process separately.We conclude with a discussion of the scalability of STTFedRec and possible variants.

Problem Definition
We consider a FL based recommendation problems, where we have a set of users U = { 1 ,  2 , ...} and items V = { 1 ,  2 , ...} .Given a user , his private data is denote as D  := {(  ,   ,   )}  =1 are locally stored on his own device,   is user's personalized features and   is the data label.Here   ∈ X and   ∈ Y are both mixtures of a wide variety of features (e.g., sparse IDs and dense features) .
We aim to build a two-tower model with two different parameterized embedding functions, including an user model M  (, Θ  ) and an item model M  (, Θ  ).The final output of the two-tower model is the similarity matching of user embeddings and item embeddings to provide a fast recall of the set of candidate items to the user from a large number of items.
The goal is to learn model parameters Θ  and Θ  collaborately from the training dataset D distributed in each user device, reducing computation and communication across participating devices and maintaining data privacy.

Framework Overview
The framework of STTFedRec is shown in Figure 2(c).There are  different user devices as clients U participate STTFedRec with their mobile devices.The client's device maintains the local user profile  and interaction data  , while the recommendation server maintains the massive global item data V.Moreover, Towards the goal of computation and communication compression.Unlike the naive setting of the federated two-tower model, illustrated in Figure 2(b), we split the user model and item model of independent sub-networks based on the split learning.The user device is only responsible for the computing of the user tower, which compresses the computation overhead on a client device.The powerful recommendation server is responsible for the training and computation of item tower and providing the item embeddings instead of complete raw item data to the client device, which reduces the communication overhead of the user devices.When online inferring, the user device only needs to perform the similarity matching with cached user embeddings and items embeddings, retains the end-to-end computing benefits of a centralized recommendation system, and achieves computation efficiency optimization.
Towards the goal of privacy-preserving.STTFedRec leverages FL to allow multiple user devices to participate in model training simultaneously without exposing their local raw data.When user devices request item embeddings from the server, obfuscated item requests are generated by random negative sampling so that the server cannot obtain the actual labels of user interaction.We also employ a secure aggregation server (SAS) instructing participants to upload local gradient and loss via SMPC protocol (Section 2.2).SAS performs privacy-preserved global user model update by secure aggregated local gradients, and the secure aggragated loss will send to the recommendation server as backpropagation (BP) error signal for global item model update.The setting is secure against semi-honest adversary from restoring user data through the fully exposed local gradients and loss.

Training and Online infering
We here present the details of STTFedRec training process, including following six steps: Step   6) Step  and compute by Eq.( 9).
Step  11), and upload them to the aggregation server.The aggregation server computes the user model's average gradients    and loss   by Eq.( 12).The average gradients will reserve for global user model update and the average loss will transfer to the recommendation server as BP error signal for global item model update.
] 2 (11) Step 5.Global user model update.At training round , the aggregation server performs the global user model update by FedAdam optimizer [29], compute as Eq.( 13), where the  1 ,  2 , and  are FedAdam's hyperparameters.The aggregation server will broadcast the updated global user model parameter Θ  +1  to all client devices for the next round of training or online inferring.
Step 6.Global item model update.At training round , the recommendation server receives the average BP error signals   from the aggregation server.The average gradients    is compute by Eq.( 14).We also use the FedAdam optimizer to update the global item model, which compute as Eq.( 15 Online inferring.The recommendation server will compute the embeddings of massive item data based on the current global item model.The item embeddings I will be transmitted to each user device.The user can complete the online inferring by computing the inner product of I and locally cached user embeddings   .Compared to transferring the complete raw item data to the user and having the user device compute the corresponding embeddings, STTFedRec can significantly reduce the size of communication and computation overhead.

Model Variations
A three-layer, fully connected neural network is built for user and item feature representation in the most straightforward setup of the two-tower model, i.e., DSSM.The activation function of each hidden and output layer is tanh.Subsequent studies introduce convolutional neural networks [30], recurrent neural networks [41], and attention mechanisms [32] in the representation layer to enhance the semantic feature representation.The unique structure of the two-tower model allows it to implement different feature engineering and mapping functions to adapt to different recommendation scenarios [34].Elkahky et al. [8] use the multi-view DNN model to build a recommendation system that combines rich features from multiple domains to enhance the personalized expression of users.Huang et al. [16] verified the effectiveness of the Multi-view DSSM under FL setting.Google used the two-tower model in a large-scale video streaming recommendation system to model the interaction between user-item pairs and enhance user representation through user's video click sequences [40].Facebook also applies the two-tower structure in a large-scale social information retrieval model [15], and verifies its effectiveness in social recommendation scenarios.
This paper focuses on the computation and communication compression of STTFedRec for the user device in the FL setting.In the subsequent experiments, we select DSSM and CLSM to build STTFedRec-DSSM and STTFedRec-CLSM for performance and efficiency evaluation.

EXPERIMENTS
This section performs extensive experiments on two public benchmark datsets to evaluate the proposed STTFedRec Framework.The experiments intend to answer the following research questions: RQ1: How does STTFedRec perform compare with baseline models?RQ2: Is the computation and communication overhead of STTFedRec significantly reduced compared to baseline models?RQ3: How does the hyperparameters user number  and negative sampling rate  effect the performance and efficiency of STTFe-dRec?

Dataset and Experiment Settings
We choose two public datasets to evaluate the performance of our proposed model, i.e., MovieLens-1M3 [13] and Adressa-1week4 [11].The two datasets have significant difference in the size of the raw item data, which facilitates us to observe the impact of computation efficiency and communication cost of user devices.The MovieLens dataset contains about 1 million explicit ratings for movies.We treat explicit records with a rating greater than or equal to 1 as positive implicit feedback.The Adressa is a real-world online news dataset from a Norwegian news portal.Following [14], we select the features include user_id, news_id, the title, and news profile and remove the stopwords in the news content.We also remove the users with less than 20 clicks to ensure a sufficient local user sample to participate in the FL training.Adressa has no user features, and we use five news titles that users have clicked as the user feature.For each user's clicked item, randomly sampled ten movies/news that users had not rated as negative feedback to generate obfuscated item requests.For both user and item features, we pre-coded with letter-tri-grams into fix-sized input vectors.We summarize the statistics of both datasets in Table 1.Hyperparameters.Adam optimizer with an initial learning rate 10 −4 with decay factor of 0.01 after every 10 rounds of training, and set  1 = 0.9 and  2 = 0.99.The select user number  for a single round of training is searched in {25, 50, 100, 200} and set as 50 for MovieLens and 150 for Adressa.The local batch size  is set as 10.  is searched in {0.9, 0.8, 0.7, 0.6, 0.5} and set as 0.9.The dropout rate is set to 0.2.We initialize weights and bias parameters by a Gaussian distribution with a mean of 0 and a standard deviation of 0.1.The hyperparameters of FCF and FedMVMF are consistent with Flanagan et al. [10].For CLSM, the convolution window sizes is set to 3. The models are written in Python3.7 programming environments by Pytorch.To simulate the computing power of mobile devices, we used a Macbook Pro with an M1 chip to simulate user devices and serialize the training process of each client for efficiency evaluation.A server equipped with dual-NVIDIA 2080Ti GPU was used for performance evaluation.
Metrics.We verify the performance of STTFedRec and baseline model in Top-K recommendations, adopt AUC, Precision@10, NDCG@10 as evaluation metrics.The AUC evaluates the overall accuracy while the Precision@10 and NDCG@10 concentrate on the very top of recommendation list.For efficiency, we report the training time (in millisecond), online infering time (in millisecond), and the communication size between a device and server (in megabytes).We repeat each experiment 5 times independently, and report average result.We compare STTFedRec with several baseline methods, including the two-tower models in centralized version and in the naive FL setting.The baselines are listed as follows: (1) Centralized DSSM [17], the original version of two-tower model for information retrieve.In our experiments, we apply three-layer fully connected neural network for both towers, the distribution of neuron density is < 256, 128, 128 >, and the activation function for each layer is set as ℎ.
(2) Centralized CLSM [30], the convolutional-pooling structured two-tower model for item retrieval.In our experiments, we apply CLSM with one Convolutionl layer, one Max-pooling layer, and one fully connected layer as output layer, the dimension of max pooling layer set to 300 and the output dimension is set to 128.(3) FCF [2], the FL based collaborative filtering recommendation model.(4) FedMVMF [10], the FL based multi-view matrix factorization recommendation model that also utilized side-information.(5) FL-DSSM [16], the naive setting of FL based DSSM for recommendation, the client device is responsible for training both user and item models, we apply same model structure as the centralized version.(6) FL-CLSM, the naive setting of FL based CLSM for recommendation, its training approach is following Huang et al. [16], the client device is responsible for training both user and item towers, we apply the same model structure as the centralized version.(7) STTFedRec-DSSM, the FL based DSSM that implemented based on our proposed STTFedRec framework.(8) STTFedRec-CLSM, the FL based CLSM that implemented based on our proposed STTFedRec framework.
The performance comparison results are shown in Table 2. From the Table, we have following observations: (1) compare to the centralized baseline model, the effect of applying STTMFedRec is within the performance loss of FL.The impact comes mainly from the bias introduced by gradient averaging and average loss based backpropogation.(2) Compare to the FL-DSSM and FL-CLSM under the naive FL setting, there is no performance loss in STTFe-dRec, i.e., using split learning and secure aggregation does not affect the performance of federated two-tower model.(3) Compare to the FCF and FedMVMF, STTFedRec is more suitable for scenarios with rich semantic information.Also, using interacted item data in user features and optimizing the representation layer can improve the performance of STTFedRec.For the communication complexity, there are two communications between each client and the server at each training iteration, gradients upload and model download.For FL-DSSM, the communication complexity is

Efficiency Comparison (RQ2)
, user devices is no longer need to download and upload the parameters of the item model.Another important communication comes from the online inferring phase.The clients of FL-DSSM need to download the complete raw item data, while the clients of STTFedRec only need to download the processed item embeddings computed by the RS server.
We demonstrated the above theory in our experiment.We report the training time, inferring time, and communication size of STTFedRec and naive setting of FL based DSSM, CLSM in Table 3.The Table shows that on the MovieLens dataset, STTFedRec improves average training time by 23× and average communication overhead by 2×.On the Adressa dataset, STTFedRec improves average training time by 40× and average communication overhead by 42×.More specifically, we observed that (1) the total communication cost and computation time of both STTFedRec and the naive setting of FL increase with the size of datasets and item feature size.Still, the increase rate of STTFedRec is slower than that of the naive setting.(2) In the case of a large and feature-dense dataset, STTFedRec has excellent benefits in reducing the computation and communication overhead.The original item data is replaced by the low-dimensional item feature embeddings provided by the server.
Only similarity computation is needed to perform online inferring, which also validates the efficiency of STTFedRec for massive item recommendation scenarios.( 3)STTFedRec shows better computational efficiency optimization when the computational complexity of the representation model of users and items is increased.The overall efficiency comparison result demonstrates that our proposed STTFedRec has better scalability than the naive setting of FL in terms of both running time and communication size.

Impact of Hyperparameters (RQ3)
We further investigated the effect of hyperparameter settings ( and ) on the performance and efficiency of STTFedRec on the Adressa dataset.Effect of .As the results are shown in Figure 4, The AUC and NDCG@10 of both STTFedRec-DSSM and STTFedRec-CLSM increased with the increasing negative sampling rate, i.e., the higher the negative sampling rate, the better the prediction performance of STTFedRec.The result consistent with the observation of Shen et al. [30] in their experiments for centralized two-tower model Effect of .The comparison result with different  as shown in figure 5  The secure aggregation time of the client device is not affected by K since our proposed circular secret sharing chain only requires devices to share secrets with neighboring members.The number of parameter matrices computed does not vary with the total number of members.

RELATED WORK
Communication and computation can be major bottlenecks in federated learning [18], as the end-user Internet connection rate is   models in the industry is enormous.Muhammad et al. [25] proposed FedFast to accelerate the training efficiency of federation recommendation models.FedFast utilizes the ActvSAMP method to build user clusters from the gradients uploaded at the user, randomly select users in different clusters in each round to participate in federation training and update the local models of similar users by parameter aggregation.This approach can improve the training efficiency globally but cannot reduce the computational effort of the user device.Khan et al. [19] introduce the multi-arm bandit mechanism to tackle the item-dependent payloads problem.However, these two methods cannot reduce the device's workload during the training phase.Qin et al. [28] propose a privacy-preserving recommendation system framework based on federated learning(PPRSF).The PPRSF utilizes the privacy hierarchy mechanism, where explicit user feedback records are considered public information, allowing the server to preprocess the model and complete the computationally costly recall module training.PPRSF improves efficiency by sacrificing user privacy.

CONCLUSION AND FUTURE WORK
This paper aims to solve the computation and communication efficiency problem in cross-device FL recommendations.To do this, we introduces split learning into the two-tower recommendation models and proposes STTFedRec, a privacy-preserving and efficient cross-device federated recommendation framework.STTFedRec achieves local computation reduction by splitting the training and computation of the item model from user devices to a performancepowered server.The server with the item model provides lowdimensional item embeddings instead of raw item data to the user devices for local training and online inferring, achieving server broadcast compression.The user devices only need to perform similarity calculations with cached user embeddings to achieve efficient online inferring.We also propose an obfuscated item request strategy and multi-party circular secret sharing chain to enhance the privacy protection of model training.The experiments conducted on two public datasets demonstrate that STTFedRec improves the average computation time and communication size of the baseline models by about 40× and 42× in the best case scenario with balanced recommendation accuracy In the next phase of study, we will further optimize the representation structure of STTFedRec to achieve more competitive prediction performance in specific recommendation scenarios.

Figure 1 :
Figure1: U-Shaped split learning configuration[18] 1.Initialization.A recommendation server initialization of training process by defines the model structure of M  (, Θ  ) and M  (, Θ  ), and initializes the model parameters as Θ 0  and Θ 0  .Randomly select a subset of available clients C 0  = {  }  =1 for the first round of training, where C 0  ∈ U and the group size denote as  = |C 0  |.Send the initialized M  and Θ 0  to each client in the subset of C 0  and aggregation server.The superscript of each parameter indicates the current training round.

Step 2 .
Obfuscated item request.The client gets the item embeddings by initiating an obfuscated item request to the recommendation server.At training round , the client    randomly select a subset of non-clicked items' ID as V −  and a subset of clicked items' ID as V +  , and the two items' ID sets form the obfuscated item request V   = {    }  =1 , where the  = |V   | as number of items requested.The proportion of clicked items denoted as  = |V −  |/|V  | is the negative sampling rate.The server will return items embeddings I   = {    }  =1 compute by Eq.(

Figure 2 :Figure 3 :
Figure 2: Illustration of (a)original two-tower model, (b)naive federated two-tower recomendation framework, and (c)split two-tower federated recommendation framework (STTFedRec) ).The updated global item model M  (, Θ  +1  ) will serve the next round obfuscated item request or online inferring. =  + 1, randomly select a new subset of available clients C  +1  and repeat the Step (2~6) for a new training

and figure 6 .
The selected user size for participating in one round of training affects convergence and secure aggregation efficiency.Specifically, we have the following observations: (1) The larger the number of users participating in a single training round, the more efficient the model convergence.(2) With the increase of , the longer the security aggregation time required by SAS as the number of parameter matrices uploaded to SAS is increasing.(3)

Figure 4 :
Figure 4: The performance by factor of negative sampling rate on Adressa

Figure 6 :
Figure 6: The efficiency by factor of K on Adressa (STTFedRec-CLSM) 3.Client local training.At training round , a client    perform local training to obtain the training loss and gradients based on the batch training set B   = {(  ,       )}  =1 .The user embeddings    is compute locally by Eq.(7).We obtain the average loss   B  of the batch training set by Eq.(8).The gradients of local user model denote as 4.Gradients and loss secure aggregation.At training round , under the coordination of aggregation server, each client    generates two random gradient matrics [   ] 1 and [   ] 2 , and two random number [ * and mixed-loss   B  * by Eq.(

Table 1 :
Overview of the datasets used in the experiment.

Table 2 :
Performance comparison results of different models in terms of AUC, Precision@10, and nDCG@10 on MovieLens and Adressa datasets.The values denote the mean ± standard deviation of metric values across 3 different model builds.

Table 3 :
Efficiency comparison result on computation and communication cost of per round training and online inferring.The running time (in millisecond) and communication size (in megabytes) on MovieLens and Adressa datasets.