<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JIS</journal-id><journal-title-group><journal-title>Journal of Information Security</journal-title></journal-title-group><issn pub-type="epub">2153-1234</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jis.2017.81005</article-id><article-id pub-id-type="publisher-id">JIS-73522</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Web Search Query Privacy, an End-User Perspective
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kato</surname><given-names>Mivule</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>Department of Computer Science, Norfolk State University, Norfolk, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>kmivule@gmail.com</email></corresp></author-notes><pub-date pub-type="epub"><day>07</day><month>12</month><year>2016</year></pub-date><volume>08</volume><issue>01</issue><fpage>56</fpage><lpage>74</lpage><history><date date-type="received"><day>December</day>	<month>28,</month>	<year>2016</year></date><date date-type="rev-recd"><day>Accepted:</day>	<month>January</month>	<year>14,</year>	</date><date date-type="accepted"><day>January</day>	<month>17,</month>	<year>2017</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  While search engines have become vital tools for searching information on the Internet, privacy issues remain a growing concern due to the technological abilities of search engines to retain user search logs. Although such capabilities might provide enhanced personalized search results, the confidentiality of user intent remains uncertain. Even with web search query obfuscation techniques, another challenge remains, namely, reusing the same obfuscation methods is problematic, given that search engines have enormous computation and storage resources for query disambiguation. A number of web search query privacy procedures involve the cooperation of the search engine, a non-trusted entity in such cases, making query obfuscation even more challenging. In this study, we provide a review on how search engines work in regards to web search queries and user intent. Secondly, this study reviews material in a manner accessible to those outside computer science with the intent to introduce knowledge of web search engines to enable non-computer scientists to approach web search query privacy innovatively. As a contribution, we identify and highlight areas open for further investigative and innovative research in regards to end-user personalized web search privacy—that is methods that can be executed on the user side without third party involvement such as, search engines. The goal is to motivate future web search obfuscation heuristics that give users control over their personal search privacy.
 
</p></abstract><kwd-group><kwd>Web Queries</kwd><kwd> Web Search Privacy</kwd><kwd> User Profile Privacy</kwd><kwd> User Intent Privacy</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Search engines have become a useful part of a daily routine when it comes to searching for information on the Internet. However, the issue of privacy remains a major concern, due to the capability of search engines to retain user search</p><p>logs. While search engine query log retention abilities might offer better personalized search results, user privacy is never guaranteed. Another challenge is that due to the enormous computation and storage power of search engines, query disambiguation keeps improving, making it problematic for users to reuse the same obfuscation techniques over time. Although research in web search query obfuscation has gained the attention of researchers [<xref ref-type="bibr" rid="scirp.73522-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref5">5</xref>] , studies have noted that web search query confidentiality continues to be a difficult pro- blem, mainly due to the monetization of search results by search engines [<xref ref-type="bibr" rid="scirp.73522-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref7">7</xref>] . For instance, on the rationale for retaining user web search query logs, search engine companies offer the following rationale for doing so [<xref ref-type="bibr" rid="scirp.73522-ref8">8</xref>] : (i) Enhancing ranking algorithms, (ii) Query fine-tuning, (iii) Improving personalized query results, (iv) Combating fraud and abuse, (v) Enabling shared data for research, and (vi) Enabling shared data for marketing and other commercial purposes. It is interesting to note that each of the mentioned reasons for retaining user search query logs is a privacy concern. Even when organizations claim to privatize web search query logs, errors can still be made; as was the case with the 2006 AOL scandal in which a user was re-identified and traced to their geo-location after an anonymized set of web search query logs was published [<xref ref-type="bibr" rid="scirp.73522-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref10">10</xref>] .</p><p>Therefore, the user-based privatization techniques that do not require third party intermediaries are urgently needed as another layer of protection. Moreover, from a policy point of view, researchers have highlighted a number of relevant issues, important for gauging privacy guarantees when it comes to implementing web search query obfuscation methods. For example, Cooper (2008) noted that web search query obfuscation techniques could be judged using the following criteria [<xref ref-type="bibr" rid="scirp.73522-ref8">8</xref>] : (i) Effectiveness of the method to protect user privacy, (ii) Effectiveness of the procedure to conserve the usefulness of query results, and (iii) How effectively the user can have control to implement the privacy technique. As noted by Cooper (2008), the reasons given by search engines for retaining user web search query logs, are often wanting, with no user confidentiality guarantees. As a contribution, we identify and highlight areas open for further investigative and innovative research in regards to personalized web search privacy―that is methods that can be executed on the user-side without third party involvement such as, search engines. The goal is to motivate future web search obfuscation heuristics to give users control over their personal search privacy. The central question being asked by this study is if it is possible to generate web search query obfuscation methods that can be executed on the user-side without third party collaboration. For trusted privacy, users require techniques executed on the user side of the machine without involving untrusted third parties, such as search engine providers. This study reviews material in a manner accessible to those outside computer science with the intent to introduce knowledge of web search engines to enable non-computer scientists to approach web search query privacy innovatively. To reach readers outside computer science but interested in web search privacy, we have broken this article down</p><p>into a review of web search engine mechanisms and suggested areas for further study. Therefore this study reviews web search engines with respect to web search queries and user intent privacy. While a number of cryptographic and anonymous web browsing techniques, such as Tor, have been suggested [<xref ref-type="bibr" rid="scirp.73522-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref14">14</xref>] , this article emphasizes privatized web search querying techniques that do not necessarily involve cryptographic and other third party anonymous web browsing methods. The rest of the paper is organized as follows. Section 2 reviews search engines and web search queries. Section 3 identifies areas for further research. Section 4 concludes the article.</p></sec><sec id="s2"><title>2. The Web Search Engine</title><p>This section presents an overview on how search engines work; an essential foundation for formulating end-user personalized web search privacy techniques. A basic understanding of web search engine mechanisms is necessary, for example, for data privacy curators to formulate new web search query obfuscation methods with enhanced query result usability when compared with existing methods.</p><sec id="s2_1"><title>2.1. How Web Search Engines Work</title><p>A web search engine is a software tool used to search for information on various subjects on the Internet, and returns the most relevant search results to the user. A typical search engine works by providing the following functionality, as illustrated in <xref ref-type="fig" rid="fig1">Figure 1</xref> [<xref ref-type="bibr" rid="scirp.73522-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref17">17</xref>] . It is important to note that each component of a search engine is a privacy concern as search engines maintain logs of user interactions with these components. While the explicit mechanics of how search engines work is beyond the scope of this paper, we endeavor to cover the web search query functionality of the search engine in greater detail since given that our concern is the privacy of search queries:</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Typical search engine functionalities</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/5-7800425x2.png"/></fig><p> Web crawling―Web crawling involves a software agent that is given starting URLs and downloads every webpage by following each URL link. The URL of every downloaded webpage is stored and the document saved in a repository.</p><p> Indexing web pages―Every downloaded webpage is indexed and stored in a repository. Each word on the downloaded page is stored, given a word identifier, sorted; and the webpage is given a document identifier.</p><p> Web search querying―Search terms from the user are converted to word identifiers and the indexed documents are searched until a match occurs.</p><p> Relevant search results―The search engine employs techniques and metrics, such as user profiling, user intent estimation, and Page Rank to return the most relevant search results to the user. A rank computation of the matching documents to the query is performed, and the top-k document results are returned; where k represents the number of documents returned.</p></sec><sec id="s2_2"><title>2.2. Web Search Engine Data Structures</title><p>Web search engines include the following data structures for enhanced functionality, a key consideration for researchers when creating query obfuscation techniques to implement privacy [<xref ref-type="bibr" rid="scirp.73522-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref19">19</xref>] :</p><p> Repository―The repository contains the full compressed HTML of every webpage, with its URL and a given document identifier.</p><p> Document Index―The document index contains every webpage or document indexed sorted by document identifiers and a URL list containing the URLs.</p><p> Lexicon―A lexicon is maintained with a list of every known word; Google’s lexicon contained 14 million words by 1999.</p><p> Hit Lists―Hit lists keep track of every occurrence of a word in a document.</p></sec><sec id="s2_3"><title>2.3. Web Search Engine Navigation</title><p>Another area of concern in regards to end-user privacy is web search navigation. Search engines seek to profile users so as to correctly target advertisements and maximize revenue. When a user enters a search query, the search engine returns results that can be categorized in two of the following groups [<xref ref-type="bibr" rid="scirp.73522-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref21">21</xref>] :</p><p> Paid search results―These are targeted results based on user profiles, intent, and query search terms, generated and paid for by advertisers, as illustrated in <xref ref-type="fig" rid="fig2">Figure 2</xref>. User profiles in this context are behavioral profiles of the user generated from web browsing history and patterns over time, and thus a major privacy concern since they would collectively reveal a user’s intent [<xref ref-type="bibr" rid="scirp.73522-ref22">22</xref>] .</p><p> Organic search results―These are general results returned by the search engine based on the user query, as shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>Organic search results can be further divided into these categories [<xref ref-type="bibr" rid="scirp.73522-ref20">20</xref>] :</p><p> Meta-search results―These are a combination of search results collected from a group of search engines.</p><p> Grouped search results―These are retrieved results after result clustering and classification.</p><p> Personalized search results―These search results are generated using user search query logs, browsing history, user profiles, and usage records.</p><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Paid search and organic search results</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/5-7800425x3.png"/></fig><p> Natural Language search results―These search results involve question answering in a natural language.</p><p> Image search results―These search results involve a return of image and multimedia to search queries.</p></sec><sec id="s2_4"><title>2.4. Paid Search Engine Results Process</title><p>When a user executes a web search query, search engines provide advertising in the form of texts and images on the search results pages, as in <xref ref-type="fig" rid="fig2">Figure 2</xref>. Paid search results are often driven by user query inputs. For example, if a user searches for “Toyota”, paid search results would include ads about where to buy a Toyota. Paid search results generally depend on the following [<xref ref-type="bibr" rid="scirp.73522-ref23">23</xref>] :</p><p> The advertiser―These are entities that provide the advertisement source; the ads are usually temporal and thematic in nature, for example, a Samsung ad during the Christmas holidays would include Christmas sales.</p><p> The search engine―The search engine acts as a middleman, targeting advertiser ads to specific audiences and users, based on data collected from user profiles, query, and browsing history. If a user’s search history includes a large amount of queries on cars, then the search engine would target “Honda” ads when the same user searches for “Honda”.</p><p> Users―Users are visitors who utilize the search engine to query for information; search engines largely seek to understand user intent so as to correctly target advertisements to the right audience.</p><p>Paid search results could be used as one aspect in monitoring the effect of implemented web search query privacy, since advertisements are always targeted to specific users based on their profiles and search queries.</p></sec><sec id="s2_5"><title>2.5. Ranking Factors</title><p>Search engines return user query results based on relevance. Relevance is affected by the search results’ importance. Factors that might influence search results’ importance include the following [<xref ref-type="bibr" rid="scirp.73522-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref24">24</xref>] :</p><p> The webpage―how frequently a word is used on a specific page.</p><p> The website―the worth and authority of the website―e.g. New York Times vs. a blogger page.</p><p> Click-through data―the number of clicks on a page―higher is better.</p><p> Social reference―how frequently a website is mentioned on social media.</p><p> Geographical Location―search engines tend to return local answers for local questions.</p><p>Factors, such as click-through-data and geographical location, are used in the generation of user-profiles by search engines, and therefore need to be given consideration when formulating end-user web search obfuscation methods.</p></sec><sec id="s2_6"><title>2.6. User Profile Generation</title><p>Another essential aspect of search engines that requires privacy research attention is the generation of user profiles, mainly used for targeting advertisements and improving relevant search results. Search engines generate user profiles by employing these two methods [<xref ref-type="bibr" rid="scirp.73522-ref25">25</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref26">26</xref>] :</p><p> Click logging―the search engine tracks each URL from the search query results based on user browsing clicks. Click logging is used to build a user web search profile and to infer user intent by the search engine to reduce irrelevant results. The method works best when a user repeats the same query a number of times or refines the query.</p><p> Profile generation―search engines employ user personal information, query history, browsing history, click-through history, bookmarks, download history, etc., in the generation of a user web profile.</p></sec><sec id="s2_7"><title>2.7. Web Search Queries</title><p>The purpose of search engines is to provide functionality for users to query for information on the Internet. Insight into the workings of web search queries is fundamental for developing effective obfuscation techniques. Web search queries are words, phrases, or descriptions that a user inputs in a search engine that are matched against documents indexed and stored by search engines, to return results relevant to users [<xref ref-type="bibr" rid="scirp.73522-ref27">27</xref>] . Web search queries can be in the form of hyperlinks, but differ from basic database search queries in that they do not use strict syntax rules, as in SQL [<xref ref-type="bibr" rid="scirp.73522-ref28">28</xref>] . There are three main web search queries categories [<xref ref-type="bibr" rid="scirp.73522-ref29">29</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref30">30</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref31">31</xref>] :</p><p> Informational queries―web search queries concerned with larger general topics, and return large related result numbers, e.g. “cars” and “travel”.</p><p> Navigational queries―these queries are concerned with finding a single website or webpage of particular inquiry, i.e. “Twitter” and “Yahoo Movies”.</p><p> Transactional queries―these query types indicate the user seeks to make an online action like an item purchase, downloading music, or viewing a movie; e.g., “buy a Toyota 2014”.</p></sec><sec id="s2_8"><title>2.8. Ranking Factors</title><p>The three main categories of web search queries could further be broken down into the following types of queries:</p><p> Boolean queries: Boolean queries involve the use of logical operators in the query construction. The two common Boolean logical operators include the AND logical operator, which returns restrictive and exclusive search results, and the inclusive OR operator. Boolean search queries have been found to be non-intuitive and difficult for users to employ [<xref ref-type="bibr" rid="scirp.73522-ref27">27</xref>] .</p><p> Faceted queries―Another category of web search queries consists of faceted queries, in which the query is divided into subjects in combination with Boolean operators, with the goal of the user viewing all the documents. An example would be, {Car AND Fuel}, {Toyota OR Honda}, {Camry and Civic} [<xref ref-type="bibr" rid="scirp.73522-ref27">27</xref>] .</p><p> Concept-based search query―Concept-based search queries employ semantic concepts and ideas rather than keywords in the composition of queries to retrieve documents in the same concept area [<xref ref-type="bibr" rid="scirp.73522-ref31">31</xref>] .</p><p> Single and Multi word sense queries―Single word sense queries are composed of a word but with wide-ranging contextual meanings. Multi word sense queries are made up of multiple single-word sense queries, providing better context to meaning than a single word query [<xref ref-type="bibr" rid="scirp.73522-ref33">33</xref>] .</p><p> Keyword-based queries―Keyword queries are composed of words or terms that tend to be short, ambiguous, and focused on a particular entity or subject. Examples include, {Toyota, Honda, Sales}. However, keyword-based queries can have several acceptable inferences, and various interpretations, generating a large set indicating what that particular query could mean [<xref ref-type="bibr" rid="scirp.73522-ref34">34</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref35">35</xref>] .</p><p> Single and multi word queries―Single word queries are composed of only one term, while multiple have many search terms. Single and multi word search queries might not be entity specific as keyword-based queries [<xref ref-type="bibr" rid="scirp.73522-ref36">36</xref>] .</p><p> Context queries―these are queries where the search is done in context with information, such as the user’s profile, search history, browsing habits, query features, user background, user interests, etc., used to disambiguate the query and return results relevant to the user [<xref ref-type="bibr" rid="scirp.73522-ref37">37</xref>] .</p><p> Natural language queries―natural language queries are composed of search terms in the form of real questions in the user’s natural language, without the use of query syntax and special formats. i.e. a user could query, “what is the circumference of the Earth?” [<xref ref-type="bibr" rid="scirp.73522-ref27">27</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref38">38</xref>] .</p><p>Web search queries tend to be brief, vague, consist of subtopics, and generalized into two major groups―faceted and ambiguous queries [<xref ref-type="bibr" rid="scirp.73522-ref30">30</xref>] :</p><p> Faceted queries: Faceted queries can be composed of subtopics, but are non-ambiguous, clear, and return precise and relevant results to the user.</p><p> Ambiguous queries: Queries usually have more than one meaning, and so the search engine returns results that might not be relevant to the user.</p><p>A categorization of ambiguous queries is needed to obfuscate web search queries, as noted by Song’s taxonomy (2007) [<xref ref-type="bibr" rid="scirp.73522-ref39">39</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref40">40</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref41">41</xref>] :</p><p> Category A: Ambiguous Queries―queries consisting of one or more search terms where each term has multiple meanings. Ambiguity results from the multiple meanings, i.e. “bank” could be a financial institution or riverbank.</p><p> Category B: Broad Queries―these query types are composed of different subtopics a user might search for. Examples include, “race cars”. Ambiguity is caused by the vagueness of the topic and subtopics. For example, a user might have differing meanings for keywords, “race” and “cars”.</p><p> Category C: Clear Queries―these are queries composed of keywords with a very narrow and specific meaning. Examples include, “Harvard University”. Clear queries return numerous search query results with a higher degree of quality than the ambiguous and broad queries.</p><p>Under the natural language processing (NLP) queries category, Wu et al. (2014) further categorized question retrieval queries into two major groups [<xref ref-type="bibr" rid="scirp.73522-ref42">42</xref>] :</p><p> Short queries: these are NLP type queries in which the questions being asked by the user are very short, unclear, and ambiguous, thus making it difficult to correctly pinpoint user intent.</p><p> Long queries: in this group of NLP queries, the question asked by the user is long and complete, often resulting in precise and relevant search results.</p></sec><sec id="s2_9"><title>2.9. Semantic Search</title><p>An important aspect of web search queries lies in semantic search. Semantic search is concerned with meaning, the understanding of an expression, insinuation, and/or inference, highlighting the relationship between similar keywords and phrases in a web search query [<xref ref-type="bibr" rid="scirp.73522-ref43">43</xref>] . Search engine organizations spend considerable effort in employing semantic search techniques to efficiently pinpoint user intent, return relevant results, and better target advertisements. Thus, for effective web search privacy and query obfuscation, proposed frameworks must take into consideration query semantics to enhance user privacy. The following are some of the relevant semantic search characteristics:</p><p> Disambiguation: Semantic search goes through disambiguation, to remove any ambiguity and multiple word meanings, to return the most probable search term meaning [<xref ref-type="bibr" rid="scirp.73522-ref44">44</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref45">45</xref>] .</p><p> Generation of relevant results: To generate relevant web search results, semantic search systems take into consideration the context of the search, location, intent, word alternatives, word substitutes, and generalized and specific word concept equivalents [<xref ref-type="bibr" rid="scirp.73522-ref46">46</xref>] .</p><p> Natural Language Processing: In semantic search, Natural Language Process- ing linguistic components, such as, homonymy and synonymy, are employed in efforts to better understand the meaning of a user query, to accurately predict user intent, and return relevant query search results [<xref ref-type="bibr" rid="scirp.73522-ref44">44</xref>] .</p></sec><sec id="s2_10"><title>2.10. Word Sense Disambiguation</title><p>Word sense disambiguation is a computational process of finding the contextual meaning of words, a problem that researchers have noted to be intractable, as are other difficult problems in artificial intelligence [<xref ref-type="bibr" rid="scirp.73522-ref47">47</xref>] . Word sense disambiguation requires external repositories of knowledge, such as thesaurus, ontologies, and corpora to get the right context and sense of words [<xref ref-type="bibr" rid="scirp.73522-ref47">47</xref>] :</p><p> Thesaurus―a text repository that offers synonyms, which are similar word meanings, and antonyms, which are opposite word meanings.</p><p> Ontologies and Lexicons―these are descriptions of concepts of particular subjects of interests, taxonomies, and semantic relations, such as WordNet.</p><p> Corpora―this is a collection of texts for studying language representations.</p><p>External knowledge repositories are crucial to effectively understand user intent and enhance personalized search results. However, the use of word sense disambiguation repositories, such as thesaurus, ontologies, and corpora, is also vital for web search query obfuscation during the query formation and reformation process. The challenge is to formulate queries so that user intent is protected, but with less disambiguation to retrieve relevant search results and thus improve usability.</p></sec><sec id="s2_11"><title>2.11. The Word Net Database</title><p>WordNet is a commonly used external repository of knowledge, employed in the process of query refinement and keyword reformation to get the appropriate word meanings. The following is a description of the possible semantic relations that could be derived when using WordNet, with applications in query disambiguation, and, more importantly, query obfuscation insofar as this study is concerned [<xref ref-type="bibr" rid="scirp.73522-ref48">48</xref>] :</p><p> Synonym―these are words with same meaning. For example, search can be replaced with investigate.</p><p> Hyponym―the first word is an exact occurrence of the second word. For example, crimson and red.</p><p> Hypernym―the second word is an exact occurrence of the first word. For example, Tablet and iPad.</p><p> Meronym―the first word is a component part of the subsequent word. For example, foot and leg.</p><p> Holonym―opposite to meronyms, the second word is a component part of the first word. For example, keyboard and keys.</p></sec><sec id="s2_12"><title>2.12. Web Search Query Disambiguation</title><p>Web search query disambiguation involves the reformation and refinement of query search terms to remove ambiguity, better predict user intent, and return the most relevant search results to the user. Web search query disambiguation involves the following techniques [<xref ref-type="bibr" rid="scirp.73522-ref43">43</xref>] :</p><p> Manual query modification―the user makes modification to the query by adding or removing search terms.</p><p> Query expansion modification―search terms are added to the original query with the use of semantics.</p><p> Query trimming modification―some query search terms are removed to improve query results.</p><p> Conjunction and disjunction modification―the conjunction (AND) is used in the query to combine search terms and return exclusively unambiguous results; the disjunction (OR) is used to return inclusively generalized results.</p><p> Substitution modification―query search terms are replaced with similar search terms using semantic techniques.</p><p> Graph-based modification―graph theory techniques are used so documents are viewed as nodes in the graph, and query search terms are employed to return semantically relevant and related documents in the graph.</p></sec><sec id="s2_13"><title>2.13. Web Search Query Reformation</title><p>As mentioned earlier, one of the most essential search engine tasks is to understand user intent and yield search results that are most pertinent to the user. Therefore, understanding query reformation provides a facet into how search engines “think”, in regards to capturing user intent. To achieve this, search engines employ web search query improvement techniques, in which user queries are refined and modified to accurately capture user intent. However, search engines also take advantage of and store web search query reformations performed by the user, to correct errors, typos, and for modifications of the query for personalized results. While search engines might return better and more specific search results to the user, web search query reformation could be used as an attack against web search query obfuscation [<xref ref-type="bibr" rid="scirp.73522-ref49">49</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref50">50</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref51">51</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref52">52</xref>] . When a user modifies an obfuscated query, it might be possible for the search engine to deduce the real query from dummy queries in such instances. Therefore, it would be essential to understand and consider query reformation techniques, and design suitable web query obfuscation methods. A taxonomy of web search query reformation was compiled by Huang and Efthimiadis (2009), on how search engines could use such categorization to improve web search results [<xref ref-type="bibr" rid="scirp.73522-ref53">53</xref>] . Huang and Efthimiadis (2009), outlined 13 categorizations of query reformation that a search engine could use to detect, estimate user intent, and offer better search results [<xref ref-type="bibr" rid="scirp.73522-ref53">53</xref>] : Word rearranging―words in the initial query are reordered but unchanged in the reformulated query. Whitespace and punctuation―whitespaces and punctuations are changed or removed in the reformulated query. Deleted words―some words are deleted from the initial query but the same words are kept in the reformed query. Supplemented words―extra words are added to the initial query. URL removal―the URL is removed from the query. Stemming? word stems are altered in a query. For example, jumping over boxes to jump over a box. Acronym formation―the query words are transformed into an acronym. For example, the United States of America is altered to USA. Acronym expansion―the modified query will have an expansion of the acronym that appeared in the initial query. Substring―the subsequent query is a rigid prefix or suffix of the initial query. For example, food restaurant is changed to food rest. Superstring―the subsequent query contains the initial query as a prefix or suffix. For example, food rest becomes food restaurants. Abbreviation―matching words in the initial query are a prefix of every word in the subsequent query. For example, higher sec is changed to high security. Word replacement―words in the initial query are replaced with semantically related words in the subsequent query. Spelling correction―the Levenshtein distance algorithm is used to predict the spelling correction a user would do in the subsequent query, by counting the number of characters between the two queries. If the Levenshtein distance is equal or less than two, the swapping takes place. For example, corection is replaced with correction. In so far as search query obfuscation is concerned, Huang and Efthimiadis (2009) observed that there exists classes of query reformations that are difficult for classifiers to detect [<xref ref-type="bibr" rid="scirp.73522-ref53">53</xref>] . These involve: Queries with semantic rephrasing―the more complex a query is rephrased, the harder it is to classify. Multi-reformations―reformation techniques used to modify the query. Classifying multi-reformation queries is difficult because they lack a commutative property. Subsequent queries can yield different results from the initial query [<xref ref-type="bibr" rid="scirp.73522-ref53">53</xref>] .</p></sec><sec id="s2_14"><title>2.14. User Intent Based Query Temporality</title><p>Generally search engines track users by generating a behavioral user profile based on the user’s search history, using techniques such as, cookie tracking, browser preferences, IP address geo-location, and URL clicks [<xref ref-type="bibr" rid="scirp.73522-ref22">22</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref54">54</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref55">55</xref>] . However, one way to understand user intent in web search queries is to study time-related search words and search phrases in a submitted query. Since user intent can be deciphered by looking at the temporality of a web search query, temporality must be considered for better search query privacy. The intent of a query, as related to time, can be categorized as follows [<xref ref-type="bibr" rid="scirp.73522-ref56">56</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref57">57</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref58">58</xref>] :</p><p> Implicit temporal intent―in which the user specifies no time data in the query phrase. For example, the “Olympics”.</p><p> Explicit temporal intent―whereby the user makes specific time reference in a query. For example, the “2016 Olympics”.</p><p>Queries with implicit temporal intent can be further broken down according to the following classifications [<xref ref-type="bibr" rid="scirp.73522-ref56">56</xref>] :</p><p> Atemporal?queries that lack time information, e.g., “Olympic games”;</p><p> Temporal unambiguous―queries with specific time related information. For example, the “2012 Olympics”;</p><p> Temporal ambiguous―queries with unclear multiple time requests, i.e., a query, “Isaac Newton”, may return his birthday and discovery times.</p></sec><sec id="s2_15"><title>2.15. Search Query Classification and Paid Ads</title><p>Another aspect important for web search query obfuscation is query classification. Web search query classification is an ongoing research challenge that involves the grouping of user queries into specific categories so as to better predict user intent, retrieve the most relevant webpages for the user, and direct web advertisements to the appropriate audiences [<xref ref-type="bibr" rid="scirp.73522-ref23">23</xref>] [<xref ref-type="bibr" rid="scirp.73522-ref59">59</xref>] - [<xref ref-type="bibr" rid="scirp.73522-ref65">65</xref>] . It is estimated that the most popular queries only involve between 2.4 and 2.7 words, making it difficult to disambiguate and pinpoint user intent, due to the small amount of information contained in the queries [<xref ref-type="bibr" rid="scirp.73522-ref23">23</xref>] . The main goal of web search query classification is to disambiguate queries, pinpoint user intent, and accurately direct paid search results to users [<xref ref-type="bibr" rid="scirp.73522-ref23">23</xref>] . However, Gabrilovich et al. observed that the problem of query classification remains intractable due to the short composition of queries. Moreover, Gabrilovich et al., pointed out that since search engines catalog huge quantities of information and, in so doing, become storehouses of knowledge, it therefore makes sense to use web query search results to get an understanding which can lead to query interpretation [<xref ref-type="bibr" rid="scirp.73522-ref23">23</xref>] .</p></sec></sec><sec id="s3"><title>3. Areas for Further Investigative Study in Web Search Privacy</title><p>This section identifies areas for further study and investigative research. While some ongoing research covers potions of the proposed study areas listed below, we believe that the identified areas need further investigation, given the intractability and complexity involved in query ambiguity and disambiguation, and the trade-offs required to find the right balance between privacy and usability. The context here is that research could focus on non-cryptographic solutions that do not require the use of third party applications. Under the web search engines and web search query mechanism, we identified the following privacy challenges, as summarized in <xref ref-type="table" rid="table1">Table 1</xref>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> A summary of web search engine and query privacy challenges</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Web Search Engine and Query Functionalities</th><th align="center" valign="middle" >Privacy and Usability Challenges</th></tr></thead><tr><td align="center" valign="middle" >1. Web search engine navigation privacy</td><td align="center" valign="middle" >Holistic privacy and usability approach to web search navigation using query formation and reformation plus result selection and navigation.</td></tr><tr><td align="center" valign="middle" >2. Paid and organic search results</td><td align="center" valign="middle" >Employing paid and organic search results as a measure/indicator of web search query obfuscation.</td></tr><tr><td align="center" valign="middle" >3. Precision and recall</td><td align="center" valign="middle" >Using precision, recall, and Page Rank to measure web search query obfuscation and usability.</td></tr><tr><td align="center" valign="middle" >4. User profile generation</td><td align="center" valign="middle" >Generation of obfuscated user profile with an acceptable level of search result usability.</td></tr><tr><td align="center" valign="middle" >5. Web search queries</td><td align="center" valign="middle" >Privacy and usability approaches to informational, transactional, and navigational queries.</td></tr><tr><td align="center" valign="middle" >6. Types of search queries</td><td align="center" valign="middle" >Privacy and usability methods, including hybrids―e.g. Boolean, concept and keyword queries.</td></tr><tr><td align="center" valign="middle" >7. Semantic search and word sense disambiguation</td><td align="center" valign="middle" >Privacy and usability approaches to semantic search using statistical, natural language processing, graph theory, and machine learning models.</td></tr><tr><td align="center" valign="middle" >8. Web search query reformation</td><td align="center" valign="middle" >How to achieve privacy and usability during web search query reformation. How can a user correct their query without revealing their intent?</td></tr><tr><td align="center" valign="middle" >9. Temporal queries</td><td align="center" valign="middle" >How can an acceptable level of usability be achieved with obfuscated temporal queries?</td></tr><tr><td align="center" valign="middle" >10. Query classification</td><td align="center" valign="middle" >How to obfuscate user intent with query classification. Methods to obfuscate search queries using both semantics and different classifications.</td></tr></tbody></table></table-wrap><p> Web search engine navigation privacy: Further study is needed into holistic privacy and usability techniques that take the web search engine navigation process into account, including, (i) query formulation, (ii) result selection, (iii) result navigation, (iv) and query reformulation.</p><p> Employing paid and organic search results to enhance privacy: Studies could include how to use paid and organic search results as a measure of obfuscation effectiveness. For instance, if paid results accurately mirror the real query instead of the dummy query then we can infer that the search engine has deciphered and separated dummy queries from real queries.</p><p> Precision and recall: Questions, such as how to improve the precision and recall of noisy web search queries with respect to user expectations and results, warrant further studies. A fruitful approach could involve the application of new or innovative statistical models and empirical application results.</p><p> Obfuscated user profile generation: Investigations could be made into whether it is possible to generate pseudo-user profiles while offering more personalized search results. This line of research has the potential to offer privacy with improved usability for obfuscated user profiles.</p><p> Web search query obfuscation: This remains an open area for further investigation. One study area asks how to generate privatized web search informational, navigational, and transactional queries with better usability. Transactional queries would be of most interest, especially with applications to transactional databases, such as Amazon and social networks.</p><p> Types of search queries: Another open area ripe for study and investigation is the obfuscation of different types of queries. For example, studies could include obfuscating Boolean queries, privatizing keyword queries, a hybrid of concept such as keyword queries for obfuscation, and, most challenging, how to obfuscate natural language queries when real questions are asked.</p><p> Semantic search and word sense disambiguation: While some significant work has been done, it remains open to further investigation due to the challenges involved in query disambiguation. Questions include how to find a balance, with trade-offs, between ambiguity and disambiguation needs during the query obfuscation process. Other investigations could focus on using mathematical and statistical modeling, natural language processing, graph-theory, and machine learning techniques for usability-aware web search query obfuscation.</p><p> Privacy in web search query reformation: There is not much research in privacy in web search query reformation. Search engines spend considerable effort disambiguating queries, and data mining user intent, with every query reformation. More studies are needed on how to achieve privacy during web search query reformation. Questions could include how web search query reformation methods could be used in conjunction with privacy methods to implement obfuscation techniques pre or post reformation. These techniques could be broken down into off-line vs. real time obfuscated query reformation.</p><p> Privacy of user intent in temporal queries: Temporal queries are vulnerable to information leakage due to date and time information in the query search terms. However, not much work has been done with respect to implementing privacy and obfuscation techniques for temporal queries. This is another area for investigation and study, with consideration to the usability of obfuscated temporal queries. If a user intended to search for “Honda 2015”, in the original query, how would usability be attained in an obfuscated “Honda 2014” query?</p><p> Web search query classification: Another area of study that has captured the interest of researchers, and is worth further investigation, is the classification of web search queries with the goal of capturing user intent. However, the classification of web search queries remains a challenge when privacy is considered. For example, is there a way to obfuscate queries using both semantics and different query classifications?</p></sec><sec id="s4"><title>4. Conclusion</title><p>In this survey, we presented an overview of web search querying from the privacy perspective and identified areas that need further investigation insofar as web search query privacy is concerned. Covering all areas of web search querying is beyond this study’s scope, but an effort was made to highlight key essentials to the formulating end-user search privacy techniques. An example of an area not covered in depth is the study of mathematical and statistical models for web search query and related obfuscation techniques, a subject left for further study. A second goal for this study was to review the material in a manner accessible to those outside computer science. The intent was to introduce knowledge of web search queries and search engines to enable non-computer scientists to approach web search query privacy innovatively. While there has been considerable research interest in web search query obfuscation, web search privacy remains a challenge with no proposed generalizable solution. Future work will include the study of mathematical and statistical models for web search query and related obfuscation techniques as we identified in the area that need web search query privacy innovation. Solutions tailored to specific domains might be more appropriate. For example, a specific search query obfuscation method for healthcare systems might not work well when making queries into social media postings. Additionally, addressing challenges of web search query privacy and usability in terms of human computer interaction will have to be considered for future work. For instance, one question could focus on how additional delay in processing web search query results after obfuscation could affect people’s willingness to make the tradeoff between privacy and usability. Finally, researchers need to consider the computation and storage power of intelligent search engines. Search engines store search query logs and over time with ever-improving disambiguation and semantic techniques. It is believable that search engines can decipher user intent and separate dummy queries from real queries in many circumstances. Privacy researchers have to have techniques that can operate in the context of a smart search engine?one that combines artificial intelligence, natural language processing, high performance computing, etc. to decipher user intent. Research into obfuscation techniques that take into consideration such search engine dynamics are needed to further advance the web search query privacy domain.</p></sec><sec id="s5"><title>Acknowledgements</title><p>We would like to acknowledge the Department of Computer Science at Norfolk State University for making this work possible.</p></sec><sec id="s6"><title>Cite this paper</title><p>Mivule, K. (2017) Web Search Query Privacy, an End-User Perspective. Journal of Information Security, 8, 56-74. http://dx.doi.org/10.4236/jis.2017.81005</p></sec></body><back><ref-list><title>References</title><ref id="scirp.73522-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Zheleva, E.G. (2011) Privacy in Social Networks: A Survey. In: Social Network Data Analytics, 277-306.</mixed-citation></ref><ref id="scirp.73522-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Gotz, M., Machanavajjhala, A., Wang, G., Xiao, X. and Gehrke, J. (2012) Publishing Search Logs—A Comparative Study of Privacy Guarantees. IEEE Transactions on Knowledge and Data Engineering, 24, 520-532.  
https://doi.org/10.1109/TKDE.2011.26</mixed-citation></ref><ref id="scirp.73522-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Chen, T., Boreli, R., Kaafar, M.A. and Friedman, A. (2014) On the Effectiveness of Obfuscation Techniques in Online Social Networks. In: Privacy Enhancing Technologies, Vol. 8555 LNCS, 42-62.</mixed-citation></ref><ref id="scirp.73522-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Ruiz-Martínez, A. (2012) A Survey on Solutions and Main Free Tools for Privacy Enhancing Web Communications. Journal of Network and Computer Applications, 35, 1473-1492. https://doi.org/10.1016/j.jnca.2012.02.011</mixed-citation></ref><ref id="scirp.73522-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Toch, E., Wang, Y. and Cranor, L.F. (2012) Personalization and Privacy: A Survey of Privacy Risks and Remedies in Personalization-Based Systems. User Modeling and User-Adapted Interaction, 22, 203-220.  
https://doi.org/10.1007/s11257-011-9110-z</mixed-citation></ref><ref id="scirp.73522-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Pang, H., Xiao, X. and Shen, J. (2012) Obfuscating the Topical Intention in Enterprise Text Search. IEEE 28th International Conference on Data Engineering (ICDE), 1-5 April 2012, 1168-1179. https://doi.org/10.1109/icde.2012.43</mixed-citation></ref><ref id="scirp.73522-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Hillard, D., Schroedl, S., Manavoglu, E., Raghavan, H. and Leggetter, C. (2010) Improving ad Relevance in Sponsored Search. In: Proceedings of the third ACM International Conference on Web Search and Data Mining—WSDM’10, ACM, New York, 361-370. https://doi.org/10.1145/1718487.1718532</mixed-citation></ref><ref id="scirp.73522-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Cooper, A. (2008) A Survey of Query Log Privacy-Enhancing Techniques from a Policy Perspective. ACM Transactions on the Web, 2, 1-27.  
https://doi.org/10.1145/1409220.1409222</mixed-citation></ref><ref id="scirp.73522-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Arrington, M. (2006) AOL Proudly Releases Massive Amounts of Private Data. Techcrunch.com.</mixed-citation></ref><ref id="scirp.73522-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Barbaro, M. and Zeller Jr., T. (2006) A Face Is Exposed for AOL Searcher No. 4417749. The New York Times, p. C4.</mixed-citation></ref><ref id="scirp.73522-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Gao, X., Yang, Y., Fu, H., Lindqvist, J. and Wang, Y. (2014) Private Browsing: An Inquiry on Usability and Privacy Protection. Proceedings of the 13th Workshop on Privacy in the Electronic Society, Scottsdale, 3 November 2014, 97-106.  
https://doi.org/10.1145/2665943.2665953</mixed-citation></ref><ref id="scirp.73522-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Danezis, G. and Diaz, C. (2008) A Survey of Anonymous Communication Channels.</mixed-citation></ref><ref id="scirp.73522-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Reed, M.G., Syverson, P.F. and Goldschlag, D.M. (1998) Anonymous Connections and Onion Routing. IEEE Journal on Selected Areas in Communications, 16, 482-494. https://doi.org/10.1109/49.668972</mixed-citation></ref><ref id="scirp.73522-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Ren, J. and Wu, J. (2010) Survey on Anonymous Communications in Computer Networks. Computer Communications, 33, 420-431.  
https://doi.org/10.1016/j.comcom.2009.11.009</mixed-citation></ref><ref id="scirp.73522-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Gordon, M. and Pathak, P. (1999) Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines. Information Processing and Management, 35, 141-180.</mixed-citation></ref><ref id="scirp.73522-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30, 107-117.</mixed-citation></ref><ref id="scirp.73522-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Ozcan, R., Altingovde, I.S., Cambazoglu, B.B., Junqueira, F.P. and Ulusoy, &amp;Ouml;. (2011) A Five-Level Static Cache Architecture for Web Search Engines. Information Processing &amp; Management, 48, 828-840.</mixed-citation></ref><ref id="scirp.73522-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Chakrabarti, S., Berg, M. and Dom, B. (1999) Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Computer Networks, 31, 1623-1640.  
https://doi.org/10.1016/S1389-1286(99)00052-3</mixed-citation></ref><ref id="scirp.73522-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Barroso, L.A., Dean, J. and Holzle, U. (2003) Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 23, 22-28.</mixed-citation></ref><ref id="scirp.73522-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Levene, M. (2010) An Introduction to Search Engines and Web Navigation. Wiley, Hoboken. https://doi.org/10.1002/9780470874233</mixed-citation></ref><ref id="scirp.73522-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Jansen, B.J. and Molina, P.R. (2006) The Effectiveness of Web Search Engines for Retrieving Relevant Ecommerce Links. Information Processing &amp; Management, 42, 1075-1098. https://doi.org/10.1016/j.ipm.2005.09.003</mixed-citation></ref><ref id="scirp.73522-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Chen, Y., Pavlov, D., Canny, J.F. and Ave, H. (2009) Large-Scale Behavioral Targeting Categories and Subject Descriptors. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 28 June-1 July 2009, 209-218. https://doi.org/10.1145/1557019.1557048</mixed-citation></ref><ref id="scirp.73522-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Gabrilovich, E., Broder, A., Fontoura, M., Joshi, A., Josifovski, V., Riedel, L. and Zhang, T. (2009) Classifying Search Queries Using the Web as a Source of Knowledge. ACM Transactions on the Web, 3, Article No. 5.  
https://doi.org/10.1145/1513876.1513877</mixed-citation></ref><ref id="scirp.73522-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Nunney, M. (2012) SEO Made Simple, Wordtracker’s Free SEO Guide.</mixed-citation></ref><ref id="scirp.73522-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Speretta, M. and Gauch, S. (2005) Personalized Search Based on User Search Histories. Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne, 19-22 September 2005, 622-628.  
https://doi.org/10.1109/WI.2005.114</mixed-citation></ref><ref id="scirp.73522-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Sugiyama, K., Hatano, K. and Yoshikawa, M. (2004) Adaptive Web Search Based on User Profile Constructed without Any Effort from Users. Proceedings of the 13th Conference on World Wide Web, New York, 17-20 May 2004, 675-684.  
https://doi.org/10.1145/988672.988764</mixed-citation></ref><ref id="scirp.73522-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval. ACM Press, New York.</mixed-citation></ref><ref id="scirp.73522-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Lee, W.M. and Sanderson, M., (2010) Analyzing URL Queries. Journal of the Association for Information Science and Technology, 61, 2300-2310.  
https://doi.org/10.1002/asi.21407</mixed-citation></ref><ref id="scirp.73522-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Broder, A. (2002) A Taxonomy of Web Search. ACM SIGIR Forum, 36, 3-10.  
https://doi.org/10.1145/792550.792552</mixed-citation></ref><ref id="scirp.73522-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Ullah, M.Z. and Aono, M. (2014) Query Subtopic Mining for Search Result Diversification. IEEE International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), Vol. 1, Bandung, 20-21 August 2014, 309-314.</mixed-citation></ref><ref id="scirp.73522-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Zamora, J., Mendoza, M. and Allende, H. (2014) Query Intent Detection Based on Query Log Mining. Journal of Web Engineering, 13, 24-52.</mixed-citation></ref><ref id="scirp.73522-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Egozi, O., Markovitch, S. and Gabrilovich, E. (2008) Concept-Based Information Retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29, Article No. 8.</mixed-citation></ref><ref id="scirp.73522-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">De Luca, E.W. and Scheel, C. (2013) Disambiguate Yourself. Translation: Computation, Corpora, Cognition, 3, 75-86.</mixed-citation></ref><ref id="scirp.73522-ref34"><label>34</label><mixed-citation publication-type="other" xlink:type="simple">Demidova, E., Zhou, X., Oelze, I. and Nejdl, W. (2010) Evaluating Evidences for Keyword Query Disambiguation in Entity Centric Database Search. 21th International Conference on Database and Expert Systems Applications, Bilbao, 30 August-3 September 2010, 240-247. https://doi.org/10.1007/978-3-642-15251-1_19</mixed-citation></ref><ref id="scirp.73522-ref35"><label>35</label><mixed-citation publication-type="other" xlink:type="simple">Pound, J. and Hudek, A.K. (2012) Interpreting Keyword Queries over Web Knowledge Bases. Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 29 October-2 November 2012, 305-314.</mixed-citation></ref><ref id="scirp.73522-ref36"><label>36</label><mixed-citation publication-type="other" xlink:type="simple">Liu, B. (2007) Web Data Mining. Springer-Verlag, Berlin Heidelberg.</mixed-citation></ref><ref id="scirp.73522-ref37"><label>37</label><mixed-citation publication-type="other" xlink:type="simple">Croft, W.B. and Wei, X. (2005) Context-Based Topic Models for Query Modification.</mixed-citation></ref><ref id="scirp.73522-ref38"><label>38</label><mixed-citation publication-type="other" xlink:type="simple">Hristidis, V. (2009) Natural Language Queries Information Discovery on Electronic Health Records. CRC Press, Boca Raton.</mixed-citation></ref><ref id="scirp.73522-ref39"><label>39</label><mixed-citation publication-type="other" xlink:type="simple">Liu, Y., Song, R., Zhang, M., Dou, Z., Yamamoto, T., Kato, M., Ohshima, H. and Zhou, K. (2014) Overview of the NTCIR-11 IMine Task. Proceedings of the 11th NTCIR Conference, Vol. 14, Tokyo, 9-12 December 2014, 8-23.</mixed-citation></ref><ref id="scirp.73522-ref40"><label>40</label><mixed-citation publication-type="book" xlink:type="simple">Luo, C., Liu, Y., Zhang, M. and Ma, S. (2014) Query Ambiguity Identification Based on User Behavior Information. In: Jaafar, A., et al., Eds., Information Retrieval Technology, Springer International Publishing, Basel, 36-47.</mixed-citation></ref><ref id="scirp.73522-ref41"><label>41</label><mixed-citation publication-type="other" xlink:type="simple">Song, R., Luo, Z., Wen, J.-R., Yu, Y. and Hon, H.-W. (2007) Identifying Ambiguous Queries in Web Search. Proceedings of the 16th ACM International Conference on World Wide Web, Banff, 08-12 May 2007, 1169-1170.  
https://doi.org/10.1145/1242572.1242749</mixed-citation></ref><ref id="scirp.73522-ref42"><label>42</label><mixed-citation publication-type="other" xlink:type="simple">Wu, H., Wu, W., Zhou, M., Chen, E., Duan, L. and Shum, H.-Y. (2014) Improving Search Relevance for Short Queries in Community Question Answering. Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, 24-28 February 2014, 43-52. https://doi.org/10.1145/2556195.2556239</mixed-citation></ref><ref id="scirp.73522-ref43"><label>43</label><mixed-citation publication-type="other" xlink:type="simple">Mangold, C. (2007) A Survey and Classification of Semantic Search Approaches. International Journal of Metadata, Semantics and Ontologies, 2, 23.  
https://doi.org/10.1504/IJMSO.2007.015073</mixed-citation></ref><ref id="scirp.73522-ref44"><label>44</label><mixed-citation publication-type="other" xlink:type="simple">Manning, C.D. and Schutze, H. (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge.</mixed-citation></ref><ref id="scirp.73522-ref45"><label>45</label><mixed-citation publication-type="other" xlink:type="simple">Guha, R., McCool, R. and Miller, E. (2003) Semantic Search. Proceedings of the 12th International Conference on World Wide Web, Budapest, 20-24 May 2003, 700-709. https://doi.org/10.1145/775152.775250</mixed-citation></ref><ref id="scirp.73522-ref46"><label>46</label><mixed-citation publication-type="other" xlink:type="simple">Bonino, D., Corno, F., Farinetti, L., Bosca, A., Torino, P. and Duca, C. (2004) Ontology Driven Semantic Search. Transactions on Information Science and Applications, 1, 1597-1605.</mixed-citation></ref><ref id="scirp.73522-ref47"><label>47</label><mixed-citation publication-type="other" xlink:type="simple">Navigli, R. (2009) Word Sense Disambiguation. ACM Computing Surveys, 41, Article No. 10. https://doi.org/10.1145/1459352.1459355</mixed-citation></ref><ref id="scirp.73522-ref48"><label>48</label><mixed-citation publication-type="other" xlink:type="simple">Miller, G.A. (1995) WordNet: A Lexical Database for English. Communications of the ACM, 38, 39-41. https://doi.org/10.1145/219717.219748</mixed-citation></ref><ref id="scirp.73522-ref49"><label>49</label><mixed-citation publication-type="other" xlink:type="simple">Dang, V., Croft, W.B. and Croft, B. (2010) Query Reformulation Using Anchor Text. Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, 4-6 February 2010, 41-50.  
https://doi.org/10.1145/1718487.1718493</mixed-citation></ref><ref id="scirp.73522-ref50"><label>50</label><mixed-citation publication-type="other" xlink:type="simple">Song, Y., Zhou, D. and He, L. (2012) Query Suggestion by Constructing Term-Transition Graphs. Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, 8-12 February 2012, 353-362.  
https://doi.org/10.1145/2124295.2124339</mixed-citation></ref><ref id="scirp.73522-ref51"><label>51</label><mixed-citation publication-type="other" xlink:type="simple">Gupta, M. and Bendersky, M. (2015) Information Retrieval with Verbose Queries. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, 9-13 August 2015, 1121-1124.  
https://doi.org/10.1145/2766462.2767877</mixed-citation></ref><ref id="scirp.73522-ref52"><label>52</label><mixed-citation publication-type="other" xlink:type="simple">Bing, L., Lam, W., Wong, T.-L. and Jameel, S. (2015) Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context. ACM Transactions on Information Systems, 33, Article No. 6. https://doi.org/10.1145/2699666</mixed-citation></ref><ref id="scirp.73522-ref53"><label>53</label><mixed-citation publication-type="other" xlink:type="simple">Huang, J. and Efthimiadis, E.N. (2009) Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs. Proceeding of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, 2-6 November 2009, 77-86.  
https://doi.org/10.1145/1645953.1645966</mixed-citation></ref><ref id="scirp.73522-ref54"><label>54</label><mixed-citation publication-type="other" xlink:type="simple">Hannak, A., Sapiezynski, P., Kakhki, A.M., Krishnamurthy, B., Lazer, D., Mislove, A. and Wilson, C. (2013) Measuring Personalization of Web Search. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, 13-17 May 2013, 527-537. https://doi.org/10.1145/2488388.2488435</mixed-citation></ref><ref id="scirp.73522-ref55"><label>55</label><mixed-citation publication-type="other" xlink:type="simple">Liu, F., Yu, C. and Meng, W. (2004) Personalized Web Search for Improving Retrieval Effectiveness. IEEE Transactions on Knowledge and Data Engineering, 16, 28-40. https://doi.org/10.1109/TKDE.2004.1264820</mixed-citation></ref><ref id="scirp.73522-ref56"><label>56</label><mixed-citation publication-type="other" xlink:type="simple">Campos, R., Al, J. and Jorge, A.M. (2011) Using Web Snippets and Web Query-Logs to Measure Implicit Temporal Intents in Queries to Cite This Version: Using Web Snippets and Query-Logs to Measure Implicit Temporal Intents in Queries. ACM SIGIR 2011 Workshop on Query Representation and Understanding, New York.</mixed-citation></ref><ref id="scirp.73522-ref57"><label>57</label><mixed-citation publication-type="other" xlink:type="simple">Lin, S., Jin, P., Zhao, X. and Yue, L. (2014) Exploiting Temporal Information in Web Search. Expert Systems with Applications, 41, 331-341.  
https://doi.org/10.1016/j.eswa.2013.07.048</mixed-citation></ref><ref id="scirp.73522-ref58"><label>58</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Z., Liu, Y., Mei, L., Hu, C. and Chen, L. (2014) Generating Temporal Semantic Context of Concepts Using Web Search Engines. Journal of Network and Computer Applications, 43, 42-55. https://doi.org/10.1016/j.jnca.2014.04.002</mixed-citation></ref><ref id="scirp.73522-ref59"><label>59</label><mixed-citation publication-type="other" xlink:type="simple">Jansen, B.J., Booth, D.L. and Spink, A. (2007) Determining the User Intent of Web Search Engine Queries. Proceedings of the 16th ACM International Conference on World Wide Web, Banff, 8-12 May 2007, 1149-1150.  
https://doi.org/10.1145/1242572.1242739</mixed-citation></ref><ref id="scirp.73522-ref60"><label>60</label><mixed-citation publication-type="other" xlink:type="simple">Ortiz-Cordova, A. and Jansen, B.J. (2012) Classifying Web Search Queries to Identify High Revenue Generating Customers. Journal of the American Society for Information Science and Technology, 63, 1426-1441.  
https://doi.org/10.1002/asi.22640</mixed-citation></ref><ref id="scirp.73522-ref61"><label>61</label><mixed-citation publication-type="other" xlink:type="simple">Rose, D.E. and Levinson, D. (2004) Understanding User Goals in Web Search. Proceedings of the 13th ACM International Conference on World Wide Web, New York, 17-20 May 2004, 13-19. https://doi.org/10.1145/988672.988675</mixed-citation></ref><ref id="scirp.73522-ref62"><label>62</label><mixed-citation publication-type="other" xlink:type="simple">Beitzel, S.M., Jensen, E.C., Frieder, O., Grossman, D., Lewis, D.D., Chowdhury, A. and Kolcz, A. (2005) Automatic Web Query Classification Using Labeled and Unlabeled Training Data. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, 15-19 August 2005, 581-582. https://doi.org/10.1145/1076034.1076138</mixed-citation></ref><ref id="scirp.73522-ref63"><label>63</label><mixed-citation publication-type="other" xlink:type="simple">Agrawal, R., Yu, X., King, I. and Zajac, R. (2011) Enrichment and Reductionism: Two Approaches for Web Query Classification. Proceedings of 18th International Conference on Neural Information Processing, Vol. 7064, Shanghai, 13-17 November 2011, 148-157.</mixed-citation></ref><ref id="scirp.73522-ref64"><label>64</label><mixed-citation publication-type="other" xlink:type="simple">Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V. and Zhang, O. (2007) Robust Classification of Rare Queries Using Web Knowledge. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, 23-27 July 2007, 231-238.  
https://doi.org/10.1145/1277741.1277783</mixed-citation></ref><ref id="scirp.73522-ref65"><label>65</label><mixed-citation publication-type="other" xlink:type="simple">Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E. and Yang, Q. (2009) Context-Aware Query Classification. The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, 19-23 July 2009, 3-10.</mixed-citation></ref></ref-list></back></article>