Attributebased data retrieval with semantic keyword search for ehealth cloud
 Yang Yang^{1, 2}Email author
DOI: 10.1186/s1367701500348
© Yang; licensee Springer. 2015
Received: 22 November 2014
Accepted: 24 March 2015
Published: 17 May 2015
Abstract
Data retrieval on encrypted documents is a very important technology in cloud storage, where encryption on sensitive data is a necessary operation to protect documents privacy before they are outsourced to cloud. Most of existing searchable encryption schemes concentrate on singleuser scenario. In this paper, we focus on the multiple sender and multiple user application scenario to provide a flexible search authorization searchable encryption (SE) scheme. The attribute based encryption (ABE) technology is used to support finegrained access control and the synonym keyword search is enabled in the new scheme. The new primitive is named as attribute based searchable encryption with synonym keyword search function (SKABSE). The formally definition of SKABSE is given together with a concrete construction. This scheme also enables convenient user revocation mechanism.
Keywords
Searchable encryption Cloud computing Access control Synonym keyword User revocationIntroduction
With the fast development of cloud computing, more and more users turns to new computing paradigm for convenient accessing to a shared pool of resources. It brings about ubiquitous and flexible access, ondemand computing resource configuration and vast computation resources at a very low price. Despite of these convenience, it also has potential risks when data owner loses directly control over their information. Privacy concern becomes the main obstacle that hiders the adoption of cloud storage by corporations.
Data encryption is a straightforward way to protect security and privacy. However, traditional encryption methods will prevent the commonly used query operation on confidential data. The keyword search becomes difficult when data are encrypted. In 2004, Boneh et al. [1] proposed the first public key encryption scheme with keyword search (PEKS) to deal with the issue of searching on confidential data. Since then, many efforts are made to improve the efficiency [24], enhance the security [57] or provide new flexible properties [812].
Although these schemes provide new features to search on encrypted data, they can not realize flexible and finegrained access control on outsourced data. Sahai et al. [13] proposed the concept of attributebased encryption (ABE) in Eurocrypt 2005, which extends the notion of identity based encryption (IBE). ABE schemes [1418] enable flexible access policy on secret data and makes data sharing quite easy. In ABE scheme, each user has a set of attributes. An access policy is defined to determine that the users with certain attributes are authorized to access the shared data.
 –
No Secret Key Sharing. A certain set of attributes is associate with user, which is also embedded in user’s private key. It avoids the risk that brought by secret key sharing in multiple user. When a data sender outsources the sensitive data to cloud server, he specifies an access policy in the data encryption phase and generate a secure index for extracted keyword. Only if the set of attributes of user satisfies the access structure in encrypted data, the user is permitted to query on those information.
 –
Semantic Keyword Search. A novel point is that semantic keyword search is enabled in this scheme. It means all documents that contain synonym related keywords will be returned when the user query on a keyword. A concrete construction of SKABSE is proposed, which is built on bilinear pairings.
 –
Finegrained Search Authorization. It supports multiple data sender and multiple data user. Finegrained search authorization is enabled, which is enforced by data owner.
 –
User Revocation. User revocation can be efficiently processed and is suitable for ehealth cloud application. It also ensures the privacy of queries and secrecy of data contents.
The rest of the paper is organized as follows. Section Preliminaries reviews some preliminary notions. Section Definitions for SKABSE Scheme defines the SKABSE and a system model is given. Section A SKABSE Construction presents the SKABSE constructions and analyzes the security. Section Conclusions concludes this paper.
Preliminaries
Bilinear map
 1.
Bilinearity: e(u ^{ a },v ^{ b })=e(u,v)^{ a b }, for all u,v∈G,a,b∈Z _{ p };
 2.
Nondegeneracy: \(e(g,g)\neq 1_{G_{1}}\) ;
 3.
Computability: there exists an efficient algorithm to calculate e(u,v), for all u,v∈G.
WordNet
In traditional English dictionaries, vocabulary entries are always organized according the alphabetical order and the synonymous relationship between vocabularies is totally ignored. WordNet [19] is a lexical database for English language, which combines modern computer technology and the research result of Cognitive Psycholinguistics research. It was created in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller starting in 1985 and has been directed in recent years by Christiane Fellbaum.
Unlike traditional dictionaries, WordNet groups English words into sets of synonyms which is called "synsets". It provides short definitions and usage examples and records a number of relations among the synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. It is accessible to human users via a web browser. The database and software tools are also freely available for download from the WordNet website [19]. WordNet includes the lexical categories nouns, verbs, adjectives and adverbs but ignores prepositions, determiners and other function words. In our scheme, WordNet is used to extend a keyword into its "synsets" in order to fulfill the semantically keyword search.
For a keyword w, it will be extended to a synonym set {w,s _{1},⋯,s _{ n }} through using WordNet, in which s _{1},⋯,s _{ n } are the synonyms of keyword w. Then, this synonym set ("synset" for short) will be reordered according to the lexicographical order and denoted as Γ _{ w }.
Access structure
Access structure is used to describe the policy of access control. It defines the concepts of authorized access subset and unauthorized access subset. It works quite similar to secret sharing. Some set of participants is able to reconstruct the shared secret while other sets can not.
Let \(\mathcal {P}=\{P_{1},P_{2},\cdots,P_{l}\},l\in Z^{+}\) be a set of participants. The shared secret is denoted as s. A set of participants that is capable to reconstruct s is named as authorized subset. While others are called unauthorized subsets. A set that consists of all authorized subsets is denoted as Γ. At the meanwhile, a set that consists of all unauthorized subsets is denoted as \(\overline {\Gamma }=2^{\mathcal {P}}\setminus \Gamma \).
An access tree is usually used to represent the the access structure (i.e., access policy). Let be an access tree and each internal node (i.e., not leaf node) be a threshold gate. Let n u m _{ x } be the number of children of node x and k _{ x } be the threshold value of x with 1≤k _{ x }≤n u m _{ x }. If k _{ x }=n u m _{ x }, the logic gate of node x is " A N D ", which means that the shared secret can be recovered if and only if all attributes are satisfied. If k _{ x }=1, the logic gate of node x is " O R ", which means that the shared secret can be recovered when one of the attributes is satisfied. Each leaf node l in access tree is associated with an attribute a t t r _{ l }. The parent node of l in access tree is represented as p a r a n t(l). All child nodes of parent node x are labeled by a number i n d e x(x) between 1 and n u m _{ x }.

If x is not a leaf node, this algorithm will calculate the value of \(\mathcal {T}_{x'}(\Psi)\) for each child node x ^{′} of x. It returns 1 if and only if at least there are k _{ x } values of \(\mathcal {T}_{x'}(\Psi)\) equal to 1. Otherwise, it returns 0.

if x is a leaf node, this algorithm will return 1 if and only if a t t r _{ x }∈Ψ. Otherwise, it returns 0.
Definitions for SKABSE scheme
The system works as follows. KDC is a fully trusted third party by all entities in the system. KDC firstly generates system global parameters and distribute attribute related private keys to users. DS is responsible to generate encrypted files and extract keyword to create secure index, which are outsourced to CSC. Distinct access policy will be sent for different document before uploading. The outsourced health records can be shared with authorized users. CSC is deemed as semitrusted, who is not only honest follow the operations specified by the scheme, but also strive to filch as much as possible information from encrypted EHR content and data retrieval request. DU could execute keyword query on encrypted EHR files. If the DU has attributes that satisfies the access policy defined by the data sender for encrypted documents, DU is capable to operate the data retrieval of those files. DU is able to generate trapdoor for keyword search and decrypt EHR files. KDC also has the authority to add or revoke the access right of participants. A revocation list of user will be provided to cloud center.
In our system, the user set is denoted as \(\mathcal {U}=\{U_{1},U_{2},\cdots,U_{l}\}\) and the attribute set as Φ={φ _{1},φ _{2},⋯,φ _{ n }}. Ψ _{ i } is used to identify the attribute set of user U _{ i }. is the access policy on ciphertext.
 –
Setup. The setup algorithm is run by KDC, which takes as input a security parameter k. It outputs a global parameter GP and the master secret key MSK for the system. The GP is made public and MSK is kept secret. It is described as S e t u p(k)→(G P,M S K).
 –
KeyGen. The key generation algorithm is also executed by KDC. The public input to the algorithm consists of the system global parameter GP, the user’s identity U _{ i } and an attribute set Ψ _{ i } of user U _{ i }. The private input to KDC is the system master secret key MSK. The output of this algorithm is the public key \(pk_{U_{i}}\phantom {\dot {i}\!}\) and private key \(sk_{U_{i}}\phantom {\dot {i}\!}\) of user U _{ i }. The private key \(sk_{U_{i}}\phantom {\dot {i}\!}\) is confidentially sent to user U _{ i }. This algorithm is described as \(KeyGen(GP,MSK,U_{i},\Psi _{i}) \to (pk_{U_{i}}, sk_{U_{i}})\phantom {\dot {i}\!}\).
 –
Encrypt. The encryption algorithm is run by data sender. Taken as input the system global parameter GP,a message M, the public key \(pk_{U_{i}}\) of user U _{ i }, a keyword w and an access structure , it will extend the keyword w to its synonym set and then encrypt it to a ciphertext CT and a secure index I _{ w }. The \((CT,I_{w},\mathbb {A})\)tuple is outsourced to cloud. This algorithm is described as \(Encrypt(GP,pk_{U_{i}},M,w,\mathbb {A}) \to (CT,I_{w})\).
 –
Trapdoor. The keyword trapdoor generation algorithm is run by data user. On input the system global parameter GP, a keyword w and the private key \(sk_{U_{i}}\phantom {\dot {i}\!}\) of user U _{ i }, this algorithm outputs a trapdoor T _{ w } for keyword w. This algorithm is described as \(Trapdoor(GP,sk_{U_{i}},w)\to T_{w}\).
 –
Retrieve. The data retrieval algorithm is run by CSC. Taken as input the system global parameter GP, the keyword secure index I _{ w }, the trapdoor T _{ w } corresponding to keyword w, the attribute set Ψ _{ i } of user U _{ i } and the access structure , this algorithm will output 1 if Ψ _{ i } satisfies and the trapdoor T _{ w } matches I _{ w }. This algorithm is described as Retrieve \((GP,I_{w},T_{w},{\Psi _{i}},\mathbb {A}) \to 1\;or\;0\).
 –
Decrypt. The decryption algorithm is operated by data user. Taken as input the system global parameter GP, the ciphertext CT, the attribute set Ψ _{ i } of user U _{ i }, the access structure and the private key \(sk_{U_{i}}\phantom {\dot {i}\!}\) of user U _{ i }, this algorithm will output the plaintext M if the attribute set Ψ _{ i } of user U _{ i } satisfies the access structure associated with the ciphertext CT. This algorithm is described as \(Decrypt(GP,CT,sk_{U_{i}},\Psi _{i},\mathbb {A})\to M\).
Security Requirements In this system, the confidentiality of data should be guaranteed. Since EHR files and the indexes are outsourced to cloud, they can be easily analyzed by CSC and the adversary. The security of the scheme should ensure that both of them can not obtain the secret data in those outsourced files. Moreover, the privacy of keyword query must be protected. The extracted keyword are usually the core information of the EHR files. The keyword query can be eavesdropped by attackers and cryptographic analyzed by CSC. The keyword trapdoor should be secure enough to resist those attacks. In this multiple user system, undeniable of request should be satisfied. No adversary is able to construct a legible keyword trapdoor and the authorized user can not denial his request.
A SKABSE construction
Construction

S e t u p(k)→(G P,M S K).

\(KeyGen(GP,MSK,U_{i},\Psi _{i}) \to (pk_{U_{i}},sk_{U_{i}})\phantom {\dot {i}\!}\).

Select random r,η from \(Z_{p}^{*}\phantom {\dot {i}\!}\).

Compute \(d_{0}=g^{\alpha r},d_{0}'=\eta,pk_{U_{i}}=g^{\eta }\phantom {\dot {i}\!}\).

For each attribute ψ _{ j } in set Ψ _{ i }, KDC computes \(d_{j}=g^{r\cdot t_{j}^{1}}\).

The public key \(pk_{U_{i}}\) of user U _{ i } is published in the system.

The secret key \(sk_{U_{i}}=(d_{0},d_{0}',d_{j}\;\forall \;\psi _{j}\in \Psi _{i}))\) is confidentially sent to user U _{ i }.

\(Encrypt(GP,pk_{U_{i}},M,w,\mathbb {A}) \to (CT,I_{w})\).
 –
Choose a random \(s\in Z_{p}^{*}\) and compute C _{0}=g ^{ s },C _{1}=M·y ^{ s }.
 –
Let the access tree corresponding to access structure to be . The root of access tree is denoted as R with a value s.
The following distribution is made according to the “AND” or “OR” relationship between parent node and child node.
∘ If the relationship of parent node and child node is “OR”, the value of child node is set to s.
∘ If the relationship of parent node and child node is “AND” and there exists t child node, DS chooses random numbers \(s_{1},s_{2},\cdots,s_{t1}\in Z_{p}^{*}\) and compute \(s_{t}=s\sum _{i=1}^{t1}s_{i}\). Then, DS assigns these values to t child nodes.
 –
For each node \(\varphi _{j,i}\in \mathcal {T}\), compute \(C_{j,i}=T_{j}^{s_{i}}\phantom {\dot {i}\!}\).
 –
The synonym set Γ _{ w } of keyword w is constructed by using WordNet. Note that the words in Γ _{ w } is ranked in lexicographical order.
 –
Select random \(\tau \in Z_{p}^{*}\phantom {\dot {i}\!}\) and compute \(A=g^{\tau }, B=e(g^{H(\Gamma _{w})},(pk_{U_{i}})^{\tau })\phantom {\dot {i}\!}\).
 –
The ciphertext of message M is \(CT=(\mathcal {T},C_{0},C_{1},C_{j,i}\psi _{j,i}\in \mathcal {T})\). The secure index of keyword w is I _{ w }=(A,B).
 –
DS outputs the ciphertext CT together with the secure index I _{ w } and access structure to cloud server for sharing.

\(Trapdoor(GP,sk_{U_{i}},w)\to T_{w}\).

\(Retrieve(GP,I_{w},T_{w},{\Psi _{i}},\mathbb {A}) \to 1\;or\;0\).
 –
DU initialize a keyword retrieval request by sending to CSC a trapdoor T _{ w } for the keyword w and an attribute set Ψ _{ i } related to user U _{ i }’s private key \(sk_{U_{i}}\phantom {\dot {i}\!}\).
 –
CSC searches \((CT,I_{w},\mathbb {A})\) in the user U _{ i }’s cloud storage to find secure index I _{ w } with the designated keyword w. CSC should verify whether the attribute Ψ _{ i } satisfies access structure associated with I _{ w }.
 –CSC tests whether the equation holds for I _{ w }=(A,B) and T _{ w }=(T _{ w,1},T _{ w,2}):$$e(A,g^{T_{w,2}})=B^{T_{w,1}}. $$
If the equation holds and Ψ _{ i } satisfies , CSC outputs 1. Otherwise, it outputs 0.
 –
All matched ciphertext set \({\mathcal {CT}}=\{CT_{1},CT_{2},\cdots \}\) will be sent to DU.

\(Decrypt(GP,CT,sk_{U_{i}},\Psi _{i},\mathbb {A})\to M\).
 –
Choose the minimum set Ψ i′∈Ψ _{ i } that could satisfy access structure .
 –Recover the message M by computing:$$C_{1}/[e(C_{0},d_{0})\cdot\prod_{\varphi\in\Psi_{i}}e(C_{j,i},d_{j})]=M.$$

Access Right Revocation.

Generate a revocation certificate as \(Revoke_{U_{i}}=\{U_{i},Date,Sig_{\textit {MSK}}(U_{i},Date)\}\), which consists of the user name U _{ i }, the revocation data Date and signature on these information generated by using master secret key MSK.

KDC sends the revocation certificate to CSC for storage.

CSC adds the certificate to revocation list. A data retrieval request will be rejected if the user is found in revocation list.
Security analysis
Proposition 1.
The proposed SKABSE scheme is correct.
Proof.
Thus, we have \(e(A,g^{T_{w,2}})=B^{T_{w,1}}\phantom {\dot {i}\!}\).

Confidentiality of Data.
In the proposed scheme, plaintext documents are encrypted to ciphertext controlled by access policy before they are outsourced to CSC. Both CSC and adversary can not get the private information in those ciphertext.

Privacy of Query.

Undeniable of Request.
This system supports queries from multiple users. In traditional multiuser system, a secret key is shared by multiple users. For a query request, both CDC and KDC could distinguish the request generated by one user from another. Therefore, nonrepudiation of query request is supported. However, in the multiuser system designed in this paper, each user is distributed different private keys, so that the submitted query request T _{ W } contains the information of user’s private key. Any adversary or CSC can not forge the trapdoor information generated by authorized user. At the meanwhile, the user can not denied their request. Therefore the query request phase satisfies undeniability.
Conclusions
Searchable encryption is a new technology that can simultaneously provide encryption and ciphertext retrieval function. To solve the problems in existing multiple user SE schemes, a novel SE scheme is designed to support finegrained access control policy and semantic keyword search. A concrete construction is provided based on bilinear pairing. Security analysis shows that the scheme could guarantee the privacy of data and keywords, and has the advantage of nonrepudiation.
Abbreviations
 ABE:

Attribute based encryption
 IBE:

Identity based encryption
 SKABSE:

Attribute based searchable encryption with synonym keyword search
 PEKS:

Public key encryption scheme with keyword search
 SE:

Searchable encryption
 EHR:

Electronic health record
 KDC:

Key distribution center
 CSC:

Cloud storage center
 DS:

Data sender
 DU:

Data user
Declarations
Acknowledgements
This research is supported by National Natural Science Foundation of China (61402112, 61472307, 61472309).
Authors’ Affiliations
References
 Boneh D, Di Crescenzo G, Ostrovsky R, Persiano G (2004) Public Key Encryption with Keyword Search In: EUROCRYPT, 506–522.. Springer, Heidelberg.
 Gu C, Zhu Y, Zhang Y (2007) Efficient public key encryption with keyword search schemes from pairings In: INSCRYPT, 372–83.. Springer, Heidelberg.
 Long B, Gu D, Ding N, Lu H (2012) On Improving the Performance of Public Key Encryption with Keyword Search In: CSC, 143–147.. IEEE, Piscataway, N.J, USA.Google Scholar
 Chen Z, Wu C, Wang D, Li S (2012) Conjunctive keywords searchable encryption with efficient pairing, constant ciphertext and short trapdoor In: PAISI, 176–189.. Springer, Heidelberg.
 Xu P, Jin H, Wu Q, Wang W (2013) PublicKey Encryption with Fuzzy Keyword SearchA Provably Secure Scheme under Keyword Gusssing Attack In: IEEE Transactions on Computers, 2266–2277.. IEEE, Piscataway, N.J, USA.Google Scholar
 Wang B, Yu S, Lou W, Hou T (2014) PrivacyPreserving MultiKeyword Fuzzy Search over Encrypted Data in the Cloud In: INFOCOM’14, 2112–2120.. IEEE, Piscataway, N.J, USA.Google Scholar
 Cao N, Wang C, Li M, Ren K, Lou W (2014) Privacypreserving multikeyword ranked search over encrypted cloud data In: IEEE Transactions on Parallel and Distributed Systems, 222–233.. IEEE, Piscataway, N.J, USA.Google Scholar
 Attrapadung N, Furukawa J, Imai H (2006) ForwardSecure and Searchable Broadcast Encryption with Short Ciphertexts and Private Keys In: ASIACRYPT 2006, LNCS 4284, 161–177.. Springer, Heidelberg.
 Bosch C, Brinkman R, Hartel P, Jonker W (2011) Conjunctive wildcard search over encrypted data In: International Conference on Secure Data Management, LNCS 6933, 114–127.. Springer, Heidelberg.
 Li J, Wang Q, Wang C, Cao N, Ren K, Lou W (2010) Fuzzy keyword search over encrypted data in cloud computing In: Proceedings of the 29th IEEE International Conference on Computer Communications(INFOCOM10), 441–445.. IEEE, Piscataway, N.J, USA.Google Scholar
 Hu C, He P, Liu P (2012) Public Key Encryption with Multikeyword Search In: NCIS2012, 568–576.. Springer, Heidelberg.
 Hwang M, Hsu S, Lee C (2014) A New Public Key Encryption with Conjunctive Field Keyword Search Scheme In: Information Technology and Control, 277–288.. Kaunas Univ. Technology, Kaunas, Lithuania.Google Scholar
 Sahai A, Waters B (2005) Fuzzy Identity Based Encryption In: EUROCRYPT’05, LNCS 3494, 457–473.. Springer, Heidelberg.
 Waters B (2011) CiphertextPolicy AttributeBased Encryption: An Expressive, Efficient, and Provably Secure Realization In: PKC’11, LNCS 6571, 53–70.. Springer, Heidelberg.
 Wang C, Luo J (2013) An Efficient KeyPolicy AttributeBased Encryption Scheme with Constant Ciphertext Length. Mathematical Problems in Engineering. volume 2013, Article ID 810969, http://dx.doi.org/10.1155/2013/810969.
 Qin B, Deng H, Wu Q, DomingoFerrer J, Naccache D, Zhou Y (2015) Flexible attributebased encryption applicable to secure ehealthcare records. International Journal of Information Security, vol. 14, nol. 1, Springer, Heidelberg, pp 1–13.
 Hohenberger S, Waters B (2014) Online/offline attributebased encryption In: PKC’14, 293–310.. Springer, Heidelberg.
 Han J, Susilo W, Mu Y, Zhou J, Au MH (2014) PPDCPABE: PrivacyPreserving Decentralized CiphertextPolicy AttributeBased Encryption In: ESORICS’14, 73–90.. Springer, Heidelberg.
 WordNet Documentation. http://wordnet.princeton.edu/wordnet/documentation/.
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.