Akin to the original Slope One CF scheme, our proposed extension also contains a pre-computation phase and a prediction phase. Pre-computation is an on-going process as users add, update or delete pair-wise ratings or deviations. The overall user-interaction diagram of our proposed model is presented in Figure 2 showing the addition of rating data and an example CF prediction query. It illustrates users, Alice and Bob, each submitting plaintext (item) pair-wise deviations of ratings of items through an identity anonymiser to the Software-as-a-Service cloud application. Another user, Carol, then queries and obtains a prediction for an arbitrary item (item k, in the example) with her encrypted query vector. The SaaS cloud application computes the prediction using homomorphic encryption and responds to Carol with an encrypted answer, which only Carol herself can decrypt.
Pre-computation
In the pre-computation phase, the plaintext deviation matrix and the plaintext cardinality matrix are computed. In the absence of full rating vectors from users and consistent user identification, the combination of the deviation and the cardinality matrices pose no privacy threat to the users’ private rating data. The collection of the rating data is done pair-wise and after the user identity is de-linked in the process through the use of known techniques, such as anonymising networks, mixed networks, pseudonymous group memberships, and so on. User submits a pair of ratings or the corresponding deviation to the cloud application at any point in time. Thus, if the user originally rated n items then pair-wise ratings or deviations should be submitted. Since the user’s identity (e.g. a pseudonym or an IP address) can (rather, must) change between consecutive submissions, the cloud cannot deterministically link the rating vector to a particular user.
Case of new ratings
In the pre-computation stage, the average deviations of ratings from item a to item b is given in equation 1. The cloud application only maintains a list of items; their pairwise deviations and cardinalities but no other user data. The process of rating addition is described in algorithm 4.
Algorithm 4
An algorithm for the addition of new ratings.
Require: An item pair identified by a and b, ratings r
a
and r
b
, or the deviation δa,b=r
a
−r
b
has been submitted.
1: Find the deviation Δa,band cardinality ϕa,v.
2: {While looking for deviations and cardinalities, also look for their inverses, i.e. Δb,aand ϕb,abecause only the upper triangular is stored. If the inverses are retrieved then deviation must be inverted before operating on it.}
3: if Δa,band ϕa,bnot found then
4: and .
5: end if
6: Update and .
7: Store and .
Ensure: While storing, write to the inverses and if these were initially retrieved. {If the inverses were retrieved then deviation must be inverted before storing it.}
8: Audit this add operation in the datastore, e.g. using user’s IP address as the identity. {This is a typical insider threat in the cloud.}
Updates and deletions
Updates or deletions of existing rating data are possible. For example, say the user has rated item a and b beforehand. When it comes to updating, he/she can notify the cloud of the difference between the new pair-wise rating deviation and the previous one and flag it to the cloud that it is an update. The process of rating update is described in algorithm 5. Similarly, for the delete operation, the additive inverse of the previous deviation, i.e. −δa,b is sent by the user to the cloud signifying a deletion. The process of rating deletion is also described in algorithm 5.
Algorithm 5
An algorithm for the updates or deletions of existing ratings.
Require: In case of update, an item pair identified by a and b, and ; or in case of deletion: an item pair identified by a and b, and −δa,b.
1: Find the deviation Δa,band cardinality ϕa,b.
2: {While looking for deviations and cardinalities, also look for their inverses, i.e. Δb,aand ϕb,abecause only the upper triangular is stored. If the inverses are retrieved then deviation must be inverted before operating on it.}
3: if Δa,band ϕa,bnot found then
4: print error!
5: end if
6: In case of an update, ; or in case of deletion: and .
7: Store and , if was changed in case of deletion.
Ensure: While storing, write to the inverses and if these were initially retrieved. {If the inverses were retrieved then deviation must be inverted before storing it.}
8: Audit this update or deletion operation in the datastore, e.g. using user’s IP address as the identity. {This is a typical insider threat in the cloud.}
Prediction
In the prediction phase, the user queries the cloud with an encrypted and complete rating vector. The encryption is carried out at the user’s end with the user’s public key. The prediction query, thus, also includes the user’s public key, which is then used by the cloud to encrypt the necessary elements from the deviation matrix and to apply homomorphic multiplication according to the prediction equation defined in equation 5, where and are decryption and encryption operations, Δx,a is the deviation of ratings between item x and item a; ϕx,ais their relative cardinality and is an encrypted rating on item a sent by user u, although the identity of the user is irrelevant in this process. Note that the final decryption is again performed at the user’s end with the user’s private key, thereby eliminating the need of any trusted third party for threshold decryption.
(5)
which is optimised by reducing the number of encryptions as follows:
(6)
The steps for the prediction is shown in algorithm 6.
Algorithm 6
An algorithm for the prediction of an item.
Require: An item x for which the prediction is to be made, a vector of encrypted ratings for other items rated by the user (i.e. each item a|a≠x) and the public key p k
u
of user u.
1: total cardinality: ; total deviation: ; total encrypted weight: ; total encrypted deviation: .
2: fordo
3: Find the deviation Δx,jand cardinality ϕx,j.
4: {While looking for deviations and cardinalities, also look for their inverses, i.e. Δj,xand ϕj,xbecause only the upper triangular is stored. If the inverses are retrieved then deviation must be inverted before operating on it.}
5: if Δx,jand ϕx,jfound then
6: .
7: .
8: . {This step involves a homomorphic addition and a homomorphic multiplication.}
9: end if
10: end for
11: . {This is a homomorphic addition.}
12: return ted and tc. {User decrypts ted; the predicted result is }
In the scheme described above, there is, in fact, one privacy leakage in the prediction phase: the number of items in the user’s original rating vector. This can be addressed by computing the prediction at the user’s end with the necessary elements from the deviation and cardinality matrices obtained from the cloud. The user can mask the actual rating vector by asking the cloud for an unnecessary number of extra items. Note that the only privacy leakage in the prediction stage is the list of items in the query vector. The corresponding ratings can only be decrypted by the user’s private key. Further to this, the user is free to use a different key pair for each prediction request such that a single public key cannot be used by an adversary to link the queries together.
Vertical partition across multiple organisations
While the solution presented above works when each user knows their own ratings, it will not work if there are two or more cloud applications with disjoint item and user sets (e.g. Yahoo movies, Netflix and IMDB have their own cloud applications), the prior PPCF solution will work if we extend our scheme as follows. For example, assume the two applications know the values r
a
and r
b
respectively for two different items for the same user. To compute the deviation r
a
−r
b
for that user without revealing it to either or to any third site, assume that the first application randomly splits the value r
a
into two shares (thus, randomly chooses za 1∈Z and compute za 2∈Z such that r
a
=za 1 + za 2). Similarly, the other application randomly chooses zb 1and computes zb 2 such that r
b
=zb 1 + zb 2. The first application sends za 2 to the other, and after receiving −zb 1 from it, computes c1=za 1−zb 1, while the other computes c2=za 2−zb 2. Finally, both can send c1and c2respectively in an unlinkable fashion to the cloud PPCF application, which obtains c1 + c2=za 1−zb 1 + za 2−zb 2=r
a
−r
b
. It is possible to do this with k cloud applications as well, where each value is split into k splits, to make the process more resistant to collusion. Note that one problem with this approach is that the same deviation is split over the k splits – effectively, cardinality is increased by k instead of by 1. One approximate fix for this is simply to increase the deviation k-fold. Thus, each application will send in k∗c
i
, instead of c
i
. However, this will only approximate the effect since instead of adding the true deviation and 1, we end up adding k times the true deviation and k. Alternatively, the cardinalities can be fixed by exposing a reduction capability, so that anyone can reduce the cardinalities appropriately in an unlinkable fashion. Assuming semi-honest participants, this works fine. If this is not suitable, the cardinalities can also be computed correctly by flagging to the PPCF application that submissions of all c
k
splits should be treated as one submission. However, in this case, the linkage between the splits will be established and now the security relies on the lack of collusion among all k applications. We will explore this issue more in the future, and also leave the experimental results for it for future work.