WANG Yunling received a masters degree in electronics and communication engineering from Xidian University, China, in 2015. She is now a Ph. D. candidate in the area of cryptography at Xidian University, China. Her research interests include cloud computing and applied cryptography.
WANG Jianfeng received an M. S. degree in mathematics and a Ph. D. degree in cryptography from Xidian University, in 2013 and 2016, respectively. He currently works at Xidian University. His research interests include applied cryptography and secure outsourced storage.
CHEN Xiaofeng received B. S. and M. S. degrees in mathematics from Northwest University, Xi’an, China, in 1998 and 2000, respectively, and a Ph. D. degree in cryptography from Xidian University, Xi’an, in 2003, where he is currently a Professor. His research interests include applied cryptography and cloud computing security. He has authored over 100 research papers in refereed international conferences and journals. His work has been cited over 5 300 times in Google scholar. He is on the editorial board of IEEE Transactions on dependable and secure computing security and communication networks, telecommunication systems, etc. He has served as the program/general chair or a program committee member for over 30 international conferences.
Cloud computing facilitates convenient and on-demand network access to a centralized pool of resources. Currently, many users prefer to outsource data to the cloud in order to mitigate the burden of local storage. However, storing sensitive data on remote servers poses privacy challenges and is currently a source of concern. SE (Searchable Encryption) is a positive way to protect users sensitive data, while preserving search ability on the server side. SE allows the server to search encrypted data without leaking information in plaintext data. The two main branches of SE are SSE (Searchable Symmetric Encryption) and PEKS (Public key Encryption with Keyword Search). SSE allows only private key holders to produce ciphertexts and to create trapdoors for search, whereas PEKS enables a number of users who know the public key to produce ciphertexts but allows only the private key holder to create trapdoors. This article surveys the two main techniques of SE: SSE and PEKS. Different SE schemes are categorized and compared in terms of functionality, efficiency, and security. Moreover, we point out some valuable directions for future work on SE schemes.
With the rapid development of cloud computing, cloud storage has enabled the provision of high data availability, easy access to data, and reduced infrastructure costs from outsourcing of data to remote servers. Many users prefer cloud storage services to relieve the burden of maintenance costs as well as the overhead of storing data locally. Moreover, users are able to access their data from anywhere and at any time instead of having to use dedicated machines.
Although cloud storage offers many advantages to users, there are still various security concerns. A remote server cannot be fully trusted because it may not only be curious about the users data but also abuse the data. When users outsource their data to a remote server, the physical access to the data is actually lost and the administration of the data is delegated to the server as well. Thus, it is necessary to guarantee the privacy of users sensitive data. The most common way of achieving privacy is to encrypt the data before outsourcing them. This approach provides end-toend data privacy as soon as the data leave the users possession. While such a solution guarantees the privacy of sensitive data, it also brings difficulties for the server to perform any meaningful function, especially search functions, on the encrypted data.
Consider a search function on plaintexts. A user sends query keywords to the server in order to retrieve corresponding documents. After searching, the server will return the search results to the user. However, during the search process, both the knowledge of the contents stored on the server and the query keywords are exposed to the semi-trusted server. Fortunately, encryption is a positive way to protect the privacy of users data, but at the same time it disrupts search functionality. A trivial way to search is to download all the ciphertexts, decrypt them, and then search on the plaintexts. However, this is impractical. Consequently, a method that provides data confidentiality and preserves search functionality simultaneously is needed as this is an open problem.
SE has been proposed. It is not only an encryption scheme but also supports keyword search on encrypted data. In SE schemes, a user can outsource a collection of encrypted data to the server while maintaining the ability to search them. From the aspect of security, the privacy of documents and keywords is maintained. The two main branches of SE are SSE and PEKS. SSE is related to the private key primitive. It allows only the private key holder to produce ciphertexts and to create trapdoors for search. PEKS, on the other hand, is related to the public key primitive. It enables a number of users who know the public key to produce ciphertexts but only allows the private key holder to create trapdoors for search.
This article surveys the practical techniques of SE. The main contributions of this paper are (1) a review of the most meaningful SE approaches, mainly focusing on SSE and PEKS, and (2) analysis and classification of these approaches. The similarities and differences of these schemes are also examined and the outstanding issues for further studies discussed. The remainder of this paper is organized as follows: the model and security requirements for SE are given in Section 2. A review of SSE techniques is given in Section 3. A review of PEKS techniques is given in Section 4. Finally, conclusions and valuable issues for further work are given in Section 5.
A searchable encryption scheme includes three parties: a trusted data owner O, a semi-trusted server S, and a collection of users who are authorized to search. The task for each party is as follows:
• Data owner: A data owner would like to outsource a collection of documents D={D1, D2, …, Dn} together with some keywords. The data owner needs to encrypt the documents and keywords in a particular manner, in order to easily search them afterward, then sends the ciphertexts to the server.
• Data user: If an authorized user wants to search the documents that contain a particular keyword, she/he has to submit the trapdoor of this query keyword to the server. After searching, the server returns the documents that contain this keyword to the user.
• Server: A server performs search tasks. When the server receives a trapdoor of a query keyword from a user, it searches over ciphertexts and then returns related documents to the user. We assume that the server is honest-butcurious. This means the server will follow the protocol correctly, but it may analyze the data received and attempt to obtain some additional information.
In a searchable encryption scheme, the security of the documents and keywords stored on the server should be guaranteed. In addition, the security of query keywords should also be assured. Moreover, the following two security items should also be protected[
• Search pattern: Search pattern is defined as any information that can be derived from knowledge of whether two search results are from the same keyword.
• Access pattern: Access pattern is defined as a sequence of search results (D(w1), …, D(wn)), where D(wi)is the search results of wi. In other words, D(wi)is a collection of documents in D that contains the keyword w.
Consider the following scenario: A user, Alice, wants to store a set of documents on a server because of limited storage resources. As the server is semi-trusted, Alice has to encrypt the documents before outsourcing them. If Alice needs some documents containing a particular keyword, she will need to submit some information in terms of query keywords to the server. Then, the server will search the ciphertext to determine which document contains the query keyword.
A general searchable symmetric encryption scheme includes four polynomial-time algorithms:
• Keygen(1k): a key generation algorithm run by the data owner. It takes a security parameter k as input, and outputs a secret key K.
• BuildIndex(K, D): a keyword index generation algorithm run by the data owner. It takes a secret key K and a set of documents D as inputs, and outputs a keyword index I.
• Trapdoor(K, w): a keyword trapdoor generation algorithm run by the user. It takes a secret key Kand a query keyword w as inputs, and outputs the trapdoor Tw for the keyword w.
• Search(I, Tw): a keyword search algorithm run by the server. It takes a keyword index I and a trapdoor Tw as inputs, and outputs a set of documents D(w) that contains query keyword w.
A searchable symmetric encryption scheme should satisfy various security requirements. The privacy of documents, search index, and query keywords should be protected, as well as the search pattern and access pattern. Song, et al. [
3. 3. 1 Single keyword search
1) SSE schemes with sequential scan. Song, et al. [
The scheme comprises three steps: encryption, search, and decryption.
Firstly, encryption is performed by the user, Alice. Suppose Alice wants to encrypt a document containing a sequence of keywords W1, …, Wl. The encryption for each keyword Wi is as follows. First, Alice encrypts Wi by using function E with key k″and obtains the ciphertext Xi which is n bits. That is, Xi=Ek″(Wi). Then Xi is split into left part Li and right part Ri, where Li is the first(n-m)bits and Ri is the latter m bits of Xi. Then Alice generates a sequence of values S1, …, Si, where Si is(n-m)bits long. To encrypt n-bits Xi, Alice takes value Si, sets , and outputs the ciphertext , where ki=fk′(Li). Finally, Alice outputs all the ciphertext Ci to the server.
Secondly, search is performed by the server. When Alice wants to search documents containing the keyword W, she computes X=Ek″(W) and k =fk′(L), and sends< X, k> to the server. Then, the server searches for X in the ciphertext by checking whether
Finally, Alice decrypts the ciphertext. For each Ci in the ciphertext, Alice generates Si using pseudorandom generator, then she XORs Si and the first(n-m)bits of Ci to obtain the Li. With the knowledge of Li, Alice can compute k i and eventually recover Wi.
In this scheme, the privacy of plaintext and the query keyword is maintained. However, it has a low search efficiency and the search time is linear in the length of the document collection, because the server needs to scan the entire ciphertext of a document when it determines whether a certain keyword is contained in the document. In addition, the plaintext is vulnerable to statistical attack according to the frequency of the query keyword occurring in the document.
2) SSE schemes with secure index. Document-based Index: To improve search efficiency, Goh[
Unlike the scheme proposed by Song, et al. [
Keyword-based Index: Another kind of secure index, called a keyword-based secure index, was proposed by Curtmola, et al. [
3) Dynamic SSE scheme. Van Liesdonk, et al. [
Cash, et al. [
3. 3. 2 Fuzzy keyword search In the SSE scheme, a user submits the trapdoor of a query keyword to the server, and the server returns the documents containing the query keyword. However, if the query keyword does not match a preset keyword, such as “campus” and “compus”, the keyword search will fail. Fortunately, fuzzy keyword search can deal with this problem as it can tolerate minor typos and formatting inconsistencies. Li, et al. [
Comparison of several SSE schemes
scheme | search time | index size | security | dynamism |
Song, et al.[2] | O(n/p) | N/A | CPA | static |
Goh[3] | O(n/p) | O(n) | IND1-CKA | dynamic |
Curtmola, et al.[1](SSE-1) | O(r) | O(m+n) | CKA1 | static |
Curtmola, et al.[1](SSE-2) | O(r) | O(mn) | CKA2 | static |
Van Liesdonk, et al.[5] | O(r) | O(mn) | CKA2 | dynamic |
Kamara, et al.[6] | O(r) | O(m+n) | CKA2 | dynamic |
Kamara, et al.[7] | O((r/p)logn) | O(mn) | CKA2 | dynamic |
3. 3. 3 Conjunctive keyword search Conjunctive keyword search allows a user to obtain documents containing several keywords during a single query. It is more efficient and suitable for real applications than single keyword search. A trivial procedure is to perform single keyword search for each keyword separately and then deal with the results. However, it is inefficient and leaks some information to the server. Golle, et al. [
3. 3. 4 Ranked and verifiable keyword search Ranked keyword search can optimize search results by returning the most relevant documents. This can reduce network traffic and enhance system usability. Swaminathan, et al. [
Verifiable keyword search can detect whether the search results are complete and correct. This can verify inaccurate search results caused by software or hardware failure, storage corruption, or even malicious behavior by a semi-honest server trying to save computation resources. Studies have also been conducted on verifiable keyword search[
Consider the following scenario: Bob sends an email with corresponding keywords to Alice. In order to protect the contents of the email and keywords, both are encrypted with Alices public key. However, in this case the email server cannot make a routing decision according to the keywords. Therefore, it is necessary to give the email server the ability to decide whether a certain keyword is contained in an email or not. Meanwhile, the email server cannot learn anything about the contents of the email and keywords. To achieve this goal, Boneh, et al. [
When user Bob wants to send Alice an email with a number of keywords, W1, …, Wk, Bob sends the following ciphertext: EApub(M), PEKS(Apub, W1), …, PEKS(Apub, Wk), where M is the content of the email, Apub is Alices public key, and PEKS is an algorithm supporting keyword search. Then, Alice produces a trapdoor Tω of keyword W and sends Tω to the gateway. After searching, the gateway returns the emails containing W to Alice. A general PKES scheme contains four polynomial-time algorithms:
• KeyGen(1k): a key generation algorithm run by Alice. It takes a security parameter k as input, and outputs a public/private key pair Apub, Apriv.
• PEKS(Apub,W): a public key encryption algorithm preserving search ability that is run by Bob.It takes the public key Apub of Alice and a keyword W as input, and outputs ciphertext S of W.
• Trapdoor(Apriv, W): a keyword trapdoor generation algorithm run by Alice. It takes Alices private key Aprivand a query keyword W as input, and outputs the trapdoor Tω of query keyword W.
• Test(A pub, S, Tω): a test algorithm run by the mail server. It takes Alices public key Apub, a ciphertext S of keyword W′ and a trapdoor of query keyword W. If W=W′, this algorithm outputs “yes”; otherwise, it outputs “no”.
In PEKS schemes, the security of the ciphertexts of a keyword (the output of the PEKS algorithm ) should be guaranteed. We argue that the ciphertexts should not leak any information about a keyword even under an adaptive chosen keyword attack. In such an attack model, an active attacker has the ability to obtain trapdoors TW for any keyword W except the challenge keywords. The attacker chooses two challenge keywords W0and W1for a challenger. The challenger randomly chooses b∈ 0; 1 and sends the ciphertext of Wb to the attacker. The attacker needs to determine the number b with the knowledge of trapdoors for other keywords. We refer to this kind of security model as PK-CKA2 security, in which the attacker cannot determine whether the ciphertext is from W0or W1. For a detailed definition of this PK-CKA2 security, please see Ref. [46]. This security definition is predominantly used in the remainder of this paper.
4. 3. 1 Single keyword search
The first PEKS scheme was proposed by Boneh, et al. [
The scheme requires two groups, G1, G2, of prime order p, which is determined by a security parameter and a bilinear map e: G1×G1→G2. In addition, the scheme requires two hash functions H1: {0, g}*→G1 and H2: G2→{0, 1}log p. The detailed algorithm is as follows:
• KeyGen(1k): The input is a security parameter k that is used to determine the size, p, of the two groups G1and G2. In addition, it needs to pick a random element α∈Zp* and a generator g of G1. Finally, it outputs a public key Apub=[g, h=gα]and a private key Apriv=α.
• PEKS(Apub, W): It first computes t=e(H 1(W), hr)∈G 2, where r is a random element in group Zp*. Then, it outputs PEKS(Apub, W)=[gr, H2(t)].
• Trapdoor(A priv, W): It outputs a trapdoor TW of certain keyword W, TW=H1(W)α∈G1.
• Test(Apub, S, TW): It tests whether H 2(e(TW, gr))=H2(t). If so, it outputs “yes”; otherwise, it outputs “no”.
On one hand, this scheme has been proven PKCKA2 secure in the Random Oracle model under the difficulty of the Bilinear Diffie-Hellman problem. However, trapdoors should be transmitted over a secure channel, to ensure only the server receives them. Furthermore, trapdoors are produced by using a deterministic encryption, thus the server can store them for further use. On the other hand, the efficiency is not sufficiently high as the PEKS algorithm requires one pairing and two exponentiations. In addition, the test algorithm requires one mapping and the search complexity is linear in the number of keywords per document.
Many studies have been conducted on methods of constructing PEKS schemes. Some typical methods are introduced and classified into three categories based on their security below.
1) Traditional PEKS: Abdalla, et al. [
Di Crescenzo and Saraswat[
Khader[
2) Secure Channel Free PEKS: Boneh, et al. ’s scheme[
3) Against Keyword Guessing Attack: Byun, et al. [
For inside KGA, Jeong, et al. [
Comparison of several PEKS schemes
scheme | search time | index | size security | assumption |
Boneh, et al.[46] | nvp | nv(2e+p) | PK-CKA2 | BDH |
Baek, et al.[53](PEKS-1) | nvp | nv(3e+p) | PK-CKA2 | CDH |
Baek, et al.[53](PEKS-2) | nvp | nv(e+2p) | PK-CKA2 | BDH |
Crescenzo and Saraswat[50] | 4nlJ | 4nvlJ | PK-CKA2 | QIP |
Khader[52] | 4nve | 5+3nve | PK-CKA2 | DDH |
Rhee, et al.[54] | nv(e + p) | 2nve+(7+nv)p | PK-CKA2 | BDH, 1-BDHI |
Rhee, et al.[59] | nv(2e + p) | 2nve+nvp | PK-CKA2 | BDH, 1-BDHI |
4. 3. 2 Conjunctive keyword search
Park, et al. [
4. 3. 3 Fuzzy keyword search Bringer, et al. [
4. 3. 4 Verifiable keyword search The server is assumed semi-honest such that it may just return a part of the search results or even inaccurate results. Zheng, et al. [
Since the proposal of SSE and PEKS in 2000 and 2004, respectively, the searchable encryption research field has received significant attention. Progress has been made in the following three main directions.
1) Query Expressiveness. Much research has been conducted on extension of query expressiveness. To make schemes more practical, not only exact single keyword search, but also fuzzy keyword search, range search, and subset search are supported. Query results have also been optimized. For example, ranked keyword search finds the most closely related results and verifiable keyword search verifies the correctness and completeness of the results. However, many schemes improve query expressiveness at the expense of efficiency or security. Therefore, future research should pay attention to the tradeoff between query expressiveness and efficiency or security.
2) Efficiency. From the aspect of SSE, the search complexity in some schemes is linear in the number of documents stored on the server. Further, some schemes achieve sublinear search times, in which the search complexity is logarithmic in the number of keywords in all documents. In addition, some schemes achieve optimal search time, in which the search complexity is linear in the number of documents containing the query keywords. With the advent of the big data era, large scale data now need to be stored on servers. Thus, the question of how to deal with large-scale data efficiently is a direction for further work. Moreover, the documents cannot be
flexibly updated because the search index is related to the keywords. Hence, the question of how to construct an efficient dynamic SSE scheme is another direction for future work. From the aspect of PKES, a large number of schemes are based on pairing maps. As a result, these schemes are inefficient because pairing maps are inefficient algorithms. Thus, construction of practical PEKS schemes is also a direction for future work.
3) Security. On one hand, although virtually all SE schemes achieve provable secure, they do not use a common security model. That is, different schemes use different security models under different assumptions. Hence, it is always difficult to compare their security. Thus, proposal of a standard security model for SE schemes is a direction for future work. Further, most schemes compromise on search pattern and access pattern. Thus, construction of an efficient scheme that does not leak search pattern and access pattern is another direction for future work.
Future work should also focus on the question of how to apply the ideas underlying SE to deal with other kinds of data. For example, how to search encrypted media data containing image data or video data; how to search an encrypted database containing relational database or non-relational database; and how to search structured social network data.
The authors have declared that no competing interests exist.
作者已声明无竞争性利益关系。