<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">IJCNS</journal-id><journal-title-group><journal-title>International Journal of Communications, Network and System Sciences</journal-title></journal-title-group><issn pub-type="epub">1913-3715</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ijcns.2017.105B022</article-id><article-id pub-id-type="publisher-id">IJCNS-76611</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Error Searching System with Keyword Extraction and Keyword Fuzzy Matching
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Fan</surname><given-names>Yang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhenghong</surname><given-names>Dong</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Lihao</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Academy of Equipment, Beijing, China</addr-line></aff><pub-date pub-type="epub"><day>26</day><month>05</month><year>2017</year></pub-date><volume>10</volume><issue>05</issue><fpage>219</fpage><lpage>226</lpage><history><date date-type="received"><day>April</day>	<month>18,</month>	<year>2017</year></date><date date-type="rev-recd"><day>Accepted:</day>	<month>May</month>	<year>23,</year>	</date><date date-type="accepted"><day>May</day>	<month>26,</month>	<year>2017</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   
   This paper has proposed an error searching method to search the solutions of errors that occurred in the unified commanding platform mix-deployed software (UCPMD). Because those errors belong to different stages or may be happened in different services, applications, IP ports, system software, or different versions of software, and those errors are also can be classified into different types. It is necessary to locate accurate reason that cause an error as well as find out its solution. The proposed error searching system applies Chinese keyword extraction and Chinese fuzzy matching between keywords, which considers the processed keywords as the index to find out the solutions of errors. Besides, the error searching system had made correspondence among errors, reasons, and solutions, and put them to different categories in terms of their characteristics, such that it is easy to manage, search, and use. Among others, we have added specialized thesaurus as the index of keywords, which enriches and completes the searching results. Because of the proposed error searching system evolves keyword extraction and keyword fuzzy matching technologies; it is more accurate to find out user-interested solutions. 
  
 
</p></abstract><kwd-group><kwd>Database Design</kwd><kwd> Search Engine</kwd><kwd> Extraction</kwd><kwd> Fuzzy Matching</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Unified commanding platform mix-deployed software (UCPMD) integrates 22 sub-system from 4 different institutes, including 92 different software. Because of the differences of underlayer protocol and the differences of standard, there are many errors occurred during the stages of setup, configuration, and operation, which seriously affect the usage. Moreover, because those errors are various, which may be happened in different operation phases, stages, TCP/IP communication protocol layers, sub-sys- tem software, it is necessary to design a database system which can manage those errors. The proposed method provides a design of error searching database, which can search the errors occurred in the stages of setup, configuration, and operation, and also provides the reason that cause the error as well as the corresponding solution. The proposed method effectively finds out the solutions of errors occurred in the UCPMD platform.</p><p>The current error searching systems are various, including On-Board Diagnostics designed for vehicles [<xref ref-type="bibr" rid="scirp.76611-ref1">1</xref>], which can search vehicle errors according to vehicle OBD error code; Computer error searching system [<xref ref-type="bibr" rid="scirp.76611-ref2">2</xref>], which can search the hardware problems, network problems, and software problems, etc. Those error-searching systems focus on errors in specific area, designing specific databases to store the errors as well as the characteristics of those errors. Because UCPMD is used for commanding and ordering between the superiors and the subordinates in specialized field, the design for DB tables, logical structures has to build up according specialized characters of UCPMD platform and the design and definition for keywords requires personalized customization. In that case, the current error searching software cannot be applied to this platform. It is necessary to design a specialized database system to effectively solve the various errors occurred in the UCPMD platform.</p><p>This paper proposes an error searching method to search the solutions of errors that occurred in UCPMD, which evolves Chinese keyword extraction [<xref ref-type="bibr" rid="scirp.76611-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.76611-ref4">4</xref>] and Chinese keyword fuzzy matching [<xref ref-type="bibr" rid="scirp.76611-ref5">5</xref>] technologies, and considers the processed keywords as the index to search errors, the reasons that cause the errors, and the corresponding solutions. Besides, this error searching system had made correspondence among errors, reasons, and solutions, and put them to different categories in terms of their characteristics, such that it is easy to manage, search, and use. Among others, according to the specialization of the system, we have added specialized thesaurus as the index of keywords, which enrich and complete the searching results.</p></sec><sec id="s2"><title>2. Related Work</title><sec id="s2_1"><title>2.1. Keyword Extraction</title><p>We applied IK Analyzer [<xref ref-type="bibr" rid="scirp.76611-ref6">6</xref>] to extract keyword. IK Analyzer is a lightweight Chinese participle and open source develop toolkit based on Java, which combines dictionary participle as well as sematic participle. It adopts “forward iteration finest-grained participle algorithm” [<xref ref-type="bibr" rid="scirp.76611-ref7">7</xref>], to support two ways of participle mode, which are fine-grained and intelligent participles. Intelligent participle supports simple process of ambiguity exclusion [<xref ref-type="bibr" rid="scirp.76611-ref8">8</xref>] and combined output for quantifiers. Besides, IK Analyzer adopts multi-processor analysis mode [<xref ref-type="bibr" rid="scirp.76611-ref7">7</xref>], which can support English letters, digitals, and Chinese characters, etc.</p><p>However, this method can only separates words from text, even the unnecessary words, such as “a”, “as”, “of” etc. It cannot extract meaningful words from the separated words. The good news is that it allows user to configure self-defined “extension stop dictionary” which can make the separation more intelligent.</p></sec><sec id="s2_2"><title>2.2. Keyword Fuzzy Search</title><p>Lucene is a developing toolkit for full text search engine [<xref ref-type="bibr" rid="scirp.76611-ref7">7</xref>], which supports for Java development. It provides Fuzzy searching (FuzzyQuery) function [<xref ref-type="bibr" rid="scirp.76611-ref9">9</xref>]. The reason why this paper applied FuzzyQuery for fuzzy searching is because FuzzyQuery makes use of similarity matching, which can recognize two similar words. FuzzyQuery makes use of the best string matching technical based on Damerau-Levenshtein Distance algorithm [<xref ref-type="bibr" rid="scirp.76611-ref10">10</xref>] to compute the transfer steps from one word to another, which is considered as the basis of marking similarity. If the similarity is less than a set value (normally, the value is 0.5), then the two words are considered as similar.</p></sec></sec><sec id="s3"><title>3. Proposed Method</title><sec id="s3_1"><title>3.1. System Function Design</title><p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows the system function design. The error database system contains two modules, which are search engine and database, where database has three subfunction modules explained as follows.</p><p>1) Data import/enter</p><p>This function supports two ways of importing data. One is importing by Excel file directly, and the other is entering data by administrator.</p><p>2) Keyword fuzzy search</p><p>This function supports for the fuzzy matching between the extracted keywords and the keywords in keyword table. The proposed method involves keyword extraction and keyword fuzzy matching technologies, which can obtain more accurate related results.</p><p>For fuzzy matching, this paper has involved two ways of fuzzy matching. The first is literally similar, which means if two keywords have the same characters, then they are similar. The second is that word meanings are similar, which means even if there is no same character between the two keywords, but they</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> System function diagram</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x2.png"/></fig><p>have similar meaning, then they are still similar. By using two ways of fuzzy matching, the searching results are more accurate; otherwise, parts of searching results are missing, which might be the solution of the searched errors.</p><p>For example, when carrying out keyword extraction, if the operator types are “人民群众的基础” (the basis of people), then the extracted keywords are “人民”、“群众”、“基础” (“people”, “masses”, “basis”).</p><p>Second, we carry out fuzzy matching for the extracted keywords “人民” (“people”). And the results are “人们”、“人群”、“公民”、“民众” (“people”, “crowd”, “citizen”, “populace”) if we apply the first way of fuzzy matching.</p><p>And the results are “百姓”、“基层” (“common people”, “grass roots”) if we apply the second way of fuzzy matching.</p><p>Then we apply the keywords obtained by both ways of fuzzy matching, then we can find out the corresponding errors, reasons, and solutions.</p><p>3) Find out the corresponding error, reason, and solution according to the extracted keywords.</p><p>Search engine includes 4 sub-function modules.</p><p>1) Keyword extraction</p><p>To extract useful keywords from the input contents. Keyword extraction technology can automatically make participle for the searching contents, and then extract keywords as the searching index. Besides, considering that the way of descriptions are different between the input content and the keywords in database, it is necessary to do fuzzy matching to extract keywords, and find out the related errors.</p><p>The searching order is first to carry out fuzzy matching with Chinese thesaurus of Lucene, and find out similar keywords, and then use the found keywords to do fuzzy matching with the keywords in database to find out the corresponding keywords. If there is no appropriate keyword in Chines thesaurus, then directly carry out fuzzy matching with the keywords in database.</p><p>2) Sorting for the searching results</p><p>This method sorts the searching results according to matching degree.</p><p>3) Second search</p><p>The system also supports second search from the already searched results. Similarly, this paper first applies keywords extraction, and then carries out fuzzy matching. Next is to find the corresponding solutions in database, and then to sort the orders of the searching results to make the results finer.</p><p>4) Visualization of searching results</p><p>This is to display the searching results visually.</p></sec><sec id="s3_2"><title>3.2. Flowchart</title><p><xref ref-type="fig" rid="fig2">Figure 2</xref> gives the whole flow chart of the error searching system which contains 5 stages, which are keyword extraction, fuzzy matching, find out error ID, soring searching results, and visualization of searching results, respectively. As searching process is the focus of this paper, which only evolves stage1, 2 and 3, so we explain the 3 stages in more detail as <xref ref-type="table" rid="table1">Table 1</xref>.</p><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> System flowchart</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x3.png"/></fig></sec></sec><sec id="s4"><title>4. Experiments and Prototype</title><p>This paper has implemented a prototype of this searching system for verifying if the searching is valid and useful.</p><p>We have shown the prototype by <xref ref-type="fig" rid="fig3">Figure 3</xref>-6. <xref ref-type="fig" rid="fig3">Figure 3</xref> shows the searching input interface, where user can type an error description in the edit box in the middle of this page, and click “故障诊断” (error diagnose). Or the user can use advanced search to refine the search contents by selecting advanced options, including error stages, type of error, error layers, occurred system, occurred soft</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Explanation on Database searching flowchart</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Searching stages</th><th align="center" valign="middle"  colspan="2"  >Contents</th></tr></thead><tr><td align="center" valign="middle"  rowspan="3"  >Stage 1 Keyword extraction</td><td align="center" valign="middle" >Case 1</td><td align="center" valign="middle" >Only typing Edit box: extract keywords from the typed contents (several keywords, including the characters of errors as well as normal keywords)</td></tr><tr><td align="center" valign="middle" >Case 2</td><td align="center" valign="middle" >Only check the advanced searching options. Considering each option as a keyword, and those advanced options can help to accurately locate the stages and locations that errors occurred.</td></tr><tr><td align="center" valign="middle" >Case 3</td><td align="center" valign="middle" >Edit box has inputs and advanced options have been checked as well. To extract keywords from the contents in edit box, and consider the checked options as keywords.</td></tr><tr><td align="center" valign="middle" >Stage 2 Fuzzy matching</td><td align="center" valign="middle"  colspan="2"  >Carry out fuzzy matching between extracted keywords and the keywords in the keyword table, find out the correspondent keyword ID and priority in the keyword table. First, to fuzzy match with the words in the word database provided by Lucene, find out similar words, and then use those similar words to fuzzy match with the keywords in keyword table, and find out the correspondent keywords. If there is no such similar words in Lucene database, then directly fuzzy match the extracted keywords with the keywords in keyword table.</td></tr><tr><td align="center" valign="middle" >Stage 3 Searching for error ID</td><td align="center" valign="middle"  colspan="2"  >1) Find out Error ID according to relation of keyword ID and error table; 2) Find out Reason ID according to relation of keyword ID and reason table; 3) Find out the correspondent Error ID according to relation of Reason ID and Error ID table; 4) Find out Solution ID according to relation of keyword ID and solution table; 5) Find out the correspondent Error ID according to relation of Solution ID and Error ID table;</td></tr></tbody></table></table-wrap><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Search input interface</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x4.png"/></fig><fig id="fig4"  position="float"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Search results sorting page</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x5.png"/></fig><p>ware, etc. And <xref ref-type="fig" rid="fig4">Figure 4</xref> shows the searching results sorting page. After searching for keyword “系统” (system), the searching results are list as <xref ref-type="fig" rid="fig4">Figure 4</xref>. By clicking any searched item in the list, such as the 3<sup>rd</sup> one. The reason that causes this error is shown as <xref ref-type="fig" rid="fig5">Figure 5</xref>. By clicking the reason shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>, the corresponding solution is shown as <xref ref-type="fig" rid="fig6">Figure 6</xref>. User can search out interested results in this way.</p></sec><sec id="s5"><title>5. Conclusion</title><p>This paper has proposed an error searching method to search the solutions of errors that occurred in the UCPMD. This method applies Chinese keyword extraction and Chinese keyword fuzzy matching technologies to find out user interested searching results. The searching results come from errors, reasons, and solutions, which means as long as an indexed keyword appears in any of the descriptions of errors, reasons, solutions, the corresponding set of error, reason,</p><fig id="fig5"  position="float"><label><xref ref-type="fig" rid="fig5">Figure 5</xref></label><caption><title> A selected error and the reason that causes the error</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x6.png"/></fig><fig id="fig6"  position="float"><label><xref ref-type="fig" rid="fig6">Figure 6</xref></label><caption><title> A selected error, the reason that causes the error, and the corresponding solution</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/76611x7.png"/></fig><p>and solution would be listed in the searching results. We also provide a prototype of the method to show the effectiveness and correction of this method.</p></sec><sec id="s6"><title>Cite this paper</title><p>Yang, F., Dong, Z.H. and Liu, L.H. (2017) Error Searching System with Keyword Extraction and Keyword Fuzzy Matching. Int. J. Communications, Network and System Sciences, 10, 219-226. https://doi.org/10.4236/ijcns.2017.105B022</p></sec></body><back><ref-list><title>References</title><ref id="scirp.76611-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Wang, J.H., Fang, M.D., Gao, J.D., Lu, H.Y. and Dai, C.B. (2006) Basic Principle and Application of On-Board Diagnostics for Gasoline Fuelled Vehicles. Automotive Engineering, 28, 491-494.</mixed-citation></ref><ref id="scirp.76611-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Lu, C., Yang, Y.-H. and Xu, G.-M. (2008) Exploitation of Computer Problem Repair and Require on Web System.</mixed-citation></ref><ref id="scirp.76611-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Wang, L.-X. and Huai, X.Y. (2012) Semantic-Based Keyword Extraction Algorithm for Chinese Text. Computer Engineering, 38.</mixed-citation></ref><ref id="scirp.76611-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Fang, J., Guo, L. and Wang, X.D. (2008) Semantically Improved Automatic Keyphrase Extraction. Computer Science, 35.</mixed-citation></ref><ref id="scirp.76611-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Wang, J.-F., Wu, X.-J., Xia, Y.Q. and Zheng, F. (2007) An Approx-imate String Matching Algorithm for Chinese Information Retrieval Systems. Journal of Chinese information Processing, 21.</mixed-citation></ref><ref id="scirp.76611-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Bai, Y.-C., Fu, W. and Xin, Y. (2014) Research and Simulation of Distributed Search Engine Based on Hadoop and Nutch. The 19th National Young People Communication Academic Annual Symposium.</mixed-citation></ref><ref id="scirp.76611-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Gao, C.J. (2013) Research on Lucene Search Engine Based on PSP-BP Neural Network. China University of Petroleum (East China), Master Degree Thesis.</mixed-citation></ref><ref id="scirp.76611-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Liu, Y.Z. (2005) Research on Chinese Auto Participle Exclude Ambiguity Algorithm. Chongqing University, Master Degree Thesis.</mixed-citation></ref><ref id="scirp.76611-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Hu, H.B. (2015) The Implementation of a Variety of Sorting Methods Based on Lucene. Computer Knowledge and Technology, 11, 57-59. 
http://en.wikipedia.org/wiki/Damerau-levenshtein_distance</mixed-citation></ref></ref-list></back></article>