<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2015.37098</article-id><article-id pub-id-type="publisher-id">JAMP-57652</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  A New Integrated Fuzzifier Evaluation and Selection (NIFEs) Algorithm for Fuzzy Clustering
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chanpaul</surname><given-names>Jin Wang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hua</surname><given-names>Fang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Sun</surname><given-names>Kim</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ann</surname><given-names>Moormann</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Honggang</surname><given-names>Wang</given-names></name><xref ref-type="aff" rid="aff4"><sup>4</sup></xref></contrib></contrib-group><aff id="aff3"><addr-line>Program of Molecular Medicine, UMass Medical School, Worcester, MA, USA</addr-line></aff><aff id="aff4"><addr-line>Department of Electrical &amp;amp; Computer Engineering, University of Massachusetts Dartmouth, Dartmouth, MA, USA</addr-line></aff><aff id="aff1"><addr-line>Division of Biostatistics and Health Services Research, Department of Quantitative Health Science, UMass Medical School, Worcester, MA, USA</addr-line></aff><aff id="aff2"><addr-line>Department of Nursing, College of Nursing and Health Sciences, University of Massachusetts, Boston, MA, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>hua.fang@umassmed.edu(HF)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>30</day><month>06</month><year>2015</year></pub-date><volume>03</volume><issue>07</issue><fpage>802</fpage><lpage>807</lpage><history><date date-type="received"><day>1</day>	<month>April</month>	<year>2015</year></date><date date-type="rev-recd"><day>accepted</day>	<month>23</month>	<year>June</year>	</date><date date-type="accepted"><day>30</day>	<month>June</month>	<year>2015</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   Fuzzy C-means (FCM) is simple and widely used for complex data pattern recognition and image analyses. However, selecting an appropriate fuzzifier (m) is crucial in identifying an optimal number of patterns and achieving higher clustering accuracy, which few studies have investigated. Built upon two existing methods on selecting fuzzifier, we developed an integrated fuzzifier evaluation and selection algorithm and tested it using real datasets. Our findings indicate that the consistent optimal number of clusters can be learnt from testing different fuzzifiers for each dataset and the fuzzifier with the lowest value for this consistency should be selected for clustering. Our evaluation also shows that the fuzzifier impacts the clustering accuracy. For longitudinal data with missing values, m = 2 could be an empirical rule to start fuzzy clustering, and the best clustering accuracy was achieved for tested data, especially using our multiple-imputation based fuzzy clustering. 
 
</p></abstract><kwd-group><kwd>Fuzzifier</kwd><kwd> Fuzzy C-Means</kwd><kwd> Multiple Imputation-Based Fuzzy Clustering (MIFuzzy)</kwd><kwd> Missing Data</kwd><kwd> Longitudinal Data</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Fuzzy C-means (FCM) is an efficient clustering method in analyzing complex data patterns. FCM introduces the concept of membership into data partition, and uses the levels of membership to indicate the degree to which an object belongs to different clusters. In various applications and for complex data, FCM demonstrates its robustness and better data partition than crisp clustering such as in MRI image studies [<xref ref-type="bibr" rid="scirp.57652-ref1">1</xref>]-[<xref ref-type="bibr" rid="scirp.57652-ref4">4</xref>]. Recently, one major FCM variant, Multiple Imputation-based Fuzzy clustering (MIFuzzy) has been developed to detect patterns and help causal inference in health and biomedical studies [<xref ref-type="bibr" rid="scirp.57652-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.57652-ref6">6</xref>].</p><p>The fuzzifier, m, also called weighting exponent, ranges from 1 to +∞. When m is close to one, the FCM approaches the hard c-means algorithm; while m approaches infinity, FCM searches the mass center of the data. Proper selection of fuzzifiers can suppress noises and improve the smoothness of FCM membership function. A smaller fuzzifier usually achieves better computational performance. The existing FCM algorithms typically set the fuzzifier to 2, which is an empirical rule but without much evidence. There are also some FCM-centric methods [<xref ref-type="bibr" rid="scirp.57652-ref7">7</xref>]-[<xref ref-type="bibr" rid="scirp.57652-ref11">11</xref>] for selecting fuzzifiers based on FCM optimization, e.g., <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x4.png" xlink:type="simple"/></inline-formula>where n is the sample size [<xref ref-type="bibr" rid="scirp.57652-ref7">7</xref>]. Recently, two data-centric methods [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>] were proposed to establish the relationship between the fuzzifier and the characteristics of datasets. Specifically, these studies examined the influence of dominant data features (e.g., dimension and sample size) on selecting fuzzifiers.</p><p>To select appropriate fuzzifiers and achieve better clustering accuracy, this paper proposes a new integrated framework for fuzzifier-selection. Our computational results show that the consistent optimal number of clusters can be learnt from testing different fuzzifiers for each dataset; and the fuzzifier with the lowest value for this consistency should be selected for clustering. Furthermore, we evaluated the impact of fuzzifier on cluster accuracy. Specifically, we tested FCM on 3 real datasets with different fuzzifier values (MIFuzzy was used for datasets with missing values), and used 2 typical validation indices (i.e., VSC, XB) for fuzzy clustering to evaluate the consistency of the optimum number of clusters with different m.</p><p>The remainder of this paper is organized as follows. Section 2 introduces two existing fuzzifier computing methods. Section 3 demonstrates our integrated fuzzifier evaluation and selection algorithm. Section 4 concludes our work.</p></sec><sec id="s2"><title>2. Two Fuzzifier Computing Methods</title><p>References [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>] used different methods to obtain fuzzifier directly from datasets. Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] theoretically proved and computed the fuzzifier in the process of FCM clustering by searching a global optimal solution. Assuming the fuzzifier m, the number of data point n, and the dimension s, they designed two different rules to compute fuzzifier as follows:</p><disp-formula id="scirp.57652-formula454"><graphic  xlink:href="http://html.scirp.org/file/57652x5.png"  xlink:type="simple"/></disp-formula><p>where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x6.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x7.png" xlink:type="simple"/></inline-formula>, where k and r denote the index of different data. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x8.png" xlink:type="simple"/></inline-formula>denotes the maximum eigenvalue of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x9.png" xlink:type="simple"/></inline-formula>. Rule <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x10.png" xlink:type="simple"/></inline-formula> is an approximation of Rule<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x11.png" xlink:type="simple"/></inline-formula>, indicating the</p><p>fuzzifier is related to the data dimension. According to Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>], if<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x12.png" xlink:type="simple"/></inline-formula>, the fuzzifier can be directly computed with Rule<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x13.png" xlink:type="simple"/></inline-formula>, otherwise Rule <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x14.png" xlink:type="simple"/></inline-formula> and Rule <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x15.png" xlink:type="simple"/></inline-formula> are invalid.</p><p>Similarly, Reference [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>] agrees that the fuzzifier m is related to the dataset dimension and size. Differently, they first used the probability theory to analyze the probability of a well-defined cluster. They found that the probability of a well-defined cluster exponentially decreases with respect to the dimension of dataset, and slightly slower with the increasing sample size. They argued that the fuzzifier m should at least qualitatively follow this tendency. They learnt a general functional relation between the fuzzifier and the dataset properties (data dimension and sample size) as shown in Equation (1) by studying the correlation among m, s, and n based on a comprehensive simulation.</p><disp-formula id="scirp.57652-formula455"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/57652x16.png"  xlink:type="simple"/></disp-formula><p>where s also denotes the dimension of dataset, and n describes the sample size.</p></sec><sec id="s3"><title>3. A New Integrated Fuzzifier Evaluation and Selection (NIFEs) Algorithm</title><p>This section describes and demonstrates our new integrated fuzzifier evaluation and selection (NIFEs) algorithm.</p><sec id="s3_1"><title>3.1. Conceptual Framework for NIFEs Algorithm</title><p>Our conceptual framework for NIFEs algorithm is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. Specifically, we use typical fuzzy clustering validation indices to evaluate the consistency in choosing the optimal number of clusters for a range of fuzzifiers; and then analyze the impact of fuzzifiers on clustering accuracy. We used two major validation indexes for fuzzy clustering: widely-used XB [<xref ref-type="bibr" rid="scirp.57652-ref14">14</xref>], and recently-developed VSC [<xref ref-type="bibr" rid="scirp.57652-ref15">15</xref>] for datasets with overlapped clusters. XB is directly related to the fuzzifier while VSC is unrelated to the fuzzifier.</p><p>Moreover, we used 3 real datasets to evaluate our algorithm as shown in <xref ref-type="table" rid="table1">Table 1</xref>: IRIS [<xref ref-type="bibr" rid="scirp.57652-ref16">16</xref>], Infectious Disease (ID) and TDTA [<xref ref-type="bibr" rid="scirp.57652-ref17">17</xref>]. Briefly, IRIS consists 150 samples from three species: Setosa, Virginica and Versicolor. Length and width of the sepals and petals (i.e., four attributes) were measured for each species. ID includes a pediatric cohort of 162 infants with 7 anti-measles antibody measures each from 2 to 8 months before vaccination. TDTA data were collected from a culturally-adapted smoking cessation intervention for Asian Americans with 9 intervention attributes. In particular, we used the classical FCM for IRIS; as ID and TDTA are longitudinal data with missing values, we used MIFuzzy [<xref ref-type="bibr" rid="scirp.57652-ref5">5</xref>] as mentioned in Section 1.</p></sec><sec id="s3_2"><title>3.2. Demonstrating New Integrated Fuzzifier Evaluation and Selection (NIFEs) Algorithm</title><p>The main idea of our new integrated fuzzifier selection (NIFEs) algorithm is to select a proper fuzzifier to ensure the optimal cluster identification and accuracy. Specifically, given the initial fuzzifier range as M: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x17.png" xlink:type="simple"/></inline-formula>, and the validation index set<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x18.png" xlink:type="simple"/></inline-formula>, we implement fuzzy clustering algorithms (e.g., FCM, MIFuzzy) with given M, and obtain the validation index set V to evaluate the clustering results. For each validation index<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x19.png" xlink:type="simple"/></inline-formula>, we use <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula> to denote the set of available fuzzifiers that can identify the optimum number of clusters. By default, we set<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula>. Then, we select <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x22.png" xlink:type="simple"/></inline-formula> as the final fuzzifier. The NIFEs pseudo codes are displayed in <xref ref-type="fig" rid="fig2">Figure 2</xref>. Here, we set v<sub>1</sub>= XB and v2 = VSC as examples to demonstrate our NIFEs algorithm. Define the XB peak as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x23.png" xlink:type="simple"/></inline-formula> that satisfies <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x24.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x25.png" xlink:type="simple"/></inline-formula>, where j denotes the cluster number. Then, the optimal number of clusters is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x26.png" xlink:type="simple"/></inline-formula>.</p><p>FCM was performed on IRIS while MIFuzzy on ID and TDTA with a range of m (m<sub>low</sub> = 2; m<sub>max</sub> = 4). The variation of validation indices (v<sub>1</sub> = XB and v<sub>2</sub> = VSC) was obtained as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>. Then we examined if</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Fuzzifier evaluation for fuzzy clustering</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x27.png"/></fig><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Dataset description</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Dataset</th><th align="center" valign="middle" >IRIS</th><th align="center" valign="middle" >TDTA</th><th align="center" valign="middle" >ID</th></tr></thead><tr><td align="center" valign="middle" >Number of Clusters</td><td align="center" valign="middle" >3</td><td align="center" valign="middle" >3</td><td align="center" valign="middle" >3</td></tr><tr><td align="center" valign="middle" >Number of Data</td><td align="center" valign="middle" >150</td><td align="center" valign="middle" >97</td><td align="center" valign="middle" >162</td></tr><tr><td align="center" valign="middle" >Number of Attributes</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >9</td><td align="center" valign="middle" >7</td></tr></tbody></table></table-wrap><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Pseudo code of NIFEs</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x28.png"/></fig><fig-group id="fig3"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Variation of validation index with different m. (a) Iris; (b) TDTA; (c) Infectious Disease (ID); (d) Iris; (e) TDTA; (f) Infectious Disease (ID).</title></caption><fig id ="fig3_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x29.png"/></fig><fig id ="fig3_2"><label> (c)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x30.png"/></fig></fig-group><p>these validation indices can consistently point to an optimal number of clusters with the range of fuzzifiers.</p><p>The optimal cluster number is corresponding to the smallest values of v<sub>1</sub>= XB or v<sub>2</sub> = VSC. As mentioned above, XB is fuzzifier-related, incorporating the compactness and separation measures of clusters. As demonstrated in the three datasets, shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>(a) IRIS, <xref ref-type="fig" rid="fig3">Figure 3</xref>(b) TDTA, and <xref ref-type="fig" rid="fig3">Figure 3</xref>(c) ID, the overall trends of XB are consistent across different fuzzifiers m, which implies that as long as XB can point to an optimal number of clusters, the smallest m could be identified. XB has consistent local minimum at 2 clusters with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x31.png" xlink:type="simple"/></inline-formula>. For the longitudinal datasets ID and TDTA using MIFuzzy, XB achieves the consistent local minimum with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x32.png" xlink:type="simple"/></inline-formula>.</p><p>Furthermore, we examined the variation of VSC, a non-fuzzifier-related index, over the same datasets, shown in Figures 3(d)-(f). The VSC curve of m = 2 is corresponding to the lower red curve. VSC incorporates the compactness and overlap measures to evaluate the quality of FCM. For all three datasets, VSC identifies the optimal number of clusters with a consistent minimum value across different fuzzifers.</p><p>Since we can obtain the consistency of an optimal number of clusters by testing different fuzzifiers, the fuzzfier with the lowest value for this consistency is regarded as the most appropriate for fuzzy clustering because of computational efficiency. Note that our idea is to detect this important consistency to establish a generalized fuzzifier evaluation algorithm; determining a final number of optimal clusters is not the scope of this study but a natural next step. <xref ref-type="table" rid="table2">Table 2</xref> shows the fuzzifier obtained with NIFEs over these 3 datasets.</p><p>Using the two methods from References [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] and [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>], we compute the optimal fuzzifiers for all these datasets as our baseline. <xref ref-type="table" rid="table3">Table 3</xref> displays the m values for each dataset from these two methods. Particularly, inf in <xref ref-type="table" rid="table3">Table 3</xref> means that the Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] method failed. Compared to <xref ref-type="table" rid="table3">Table 3</xref>, NIFES agrees with the majority of fuzzifier identified by Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] or [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>]. In general, NIFEs seems to be more reliable, for example, m = 2 is appropriate for IRIS according to literature but Reference [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>] suggested m = 4; for TDTA, both Reference [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>] and our NIFES agree m = 2, which is appropriate according to our previous investigation while Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] suggested 3.993.</p><p>Furthermore, given the real cluster number for each data set shown in <xref ref-type="table" rid="table1">Table 1</xref>, we examined the clustering accuracy of different m displayed in <xref ref-type="fig" rid="fig4">Figure 4</xref>. Given a sample size N, denote G as the correct number of cases identified in known clusters, the clustering accuracy is defined as G/N.</p><p>As shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>, fuzzifier m = 2 could lead to better or comparable clustering accuracy given the identified optimal cluster number across the three datasets. Especially for longitudinal data with missing values (TDTA and ID), m = 2 shows the correct accuracy according to our known results.</p></sec></sec><sec id="s4"><title>4. Conclusions</title><p>This paper investigates selection of fuzzifier, an important element for FCM, using three real datasets: one well-</p><fig-group id="fig4"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Clustering accuracy with different fuzzifier m across 3 datasets. (a) IRIS; (b) TDTA; (c) ID.</title></caption><fig id ="fig4_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x33.png"/></fig><fig id ="fig4_2"><label> (c)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x34.png"/></fig><fig id ="fig4_3"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/57652x35.png"/></fig></fig-group><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> m value leant from new integrated fuzzifier evaluation (NIFES) algorithm</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >IRIS</th><th align="center" valign="middle" >TDTA</th><th align="center" valign="middle" >ID</th></tr></thead><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x36.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >[2, 3]</td><td align="center" valign="middle" >[2, 2.6]</td><td align="center" valign="middle" >[2, 2.8]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/57652x37.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >[2, 4]</td><td align="center" valign="middle" >[2, 4]</td><td align="center" valign="middle" >[2, 4]</td></tr><tr><td align="center" valign="middle" >Learnt m value</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >2</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> m value computed using methods from reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>] and [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>]</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >IRIS</th><th align="center" valign="middle" >TDTA</th><th align="center" valign="middle" >ID</th></tr></thead><tr><td align="center" valign="middle" >Reference [<xref ref-type="bibr" rid="scirp.57652-ref12">12</xref>]</td><td align="center" valign="middle" >inf</td><td align="center" valign="middle" >3.993</td><td align="center" valign="middle" >inf</td></tr><tr><td align="center" valign="middle" >Reference [<xref ref-type="bibr" rid="scirp.57652-ref13">13</xref>]</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >2</td></tr></tbody></table></table-wrap><p>known biological data, IRIS; and two longitudinal data with missing values, TDTA and ID. We design a new integrated fuzzifier evaluation and selection (NIFEs) algorithm to assess and select the proper fuzzifer. The conceptual NIFEs framework is comprehensive, involving testing (non-)fuzzier related indices and clustering accuracy across a range of fuzzifiers. Our results indicate that our NIFEs algorithm is more reliable than two existing methods and could be a complementary reference for the fuzzy clustering field. Our findings indicate that the consistent optimal number of clusters can be learnt from testing different fuzzifiers for each dataset and the fuzzifier with the lowest value for this consistency should be selected for clustering for computational efficiency. Our evaluation also shows that the fuzzifier impacts the clustering accuracy. For longitudinal data with missing values, m = 2 could be an empirical rule to start fuzzy clustering, and the best clustering accuracy was achieved for tested data, especially using our multiple-imputation based fuzzy clustering.</p></sec><sec id="s5"><title>Acknowledgements</title><p>This research was supported by NIH grant R01 DA033323-01A1, 1UL1RR031982-01 Pilot Project to Dr. Fang.</p></sec><sec id="s6"><title>Cite this paper</title><p>Chanpaul Jin Wang,Hua Fang,Sun Kim,Ann Moormann,Honggang Wang, (2015) A New Integrated Fuzzifier Evaluation and Selection (NIFEs) Algorithm for Fuzzy Clustering. Journal of Applied Mathematics and Physics,03,802-807. doi: 10.4236/jamp.2015.37098</p></sec><sec id="s7"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.57652-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Clark, M.C., Hall, L.O., Goldgof, D.B., et al. (2002) MRI Segmentation Using Fuzzy Clustering Techniques. IEEE Engineering in Medicine and Biology Magazine, 13, 730-742. http://dx.doi.org/10.1109/51.334636</mixed-citation></ref><ref id="scirp.57652-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Wang, C.J., Fang, H. and Wang, H. (2014) DAG-Searched and Density-Based Initial Centroid Location Method for Fuzzy Clustering of Big Biomedical Data. BICT2014. http://dx.doi.org/10.4108/icst.bict.2014.257932</mixed-citation></ref><ref id="scirp.57652-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Tsai, D.-M. and Lin, C.-C. (2011) Fuzzy C-Means Based Clustering for Linearly and Nonlinearly Separable Data. Pattern Recognition, 44, 1750-1760. http://dx.doi.org/10.1016/j.patcog.2011.02.009</mixed-citation></ref><ref id="scirp.57652-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Mei, J.-P. and Chen, L.H. (2013) LinkFCM: Relation Integrated Fuzzy c-Means. Pattern Recognition, 46, 272-283.  
http://dx.doi.org/10.1016/j.patcog.2012.06.012</mixed-citation></ref><ref id="scirp.57652-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Fang, H., Johnson, C., et al. (2011) A New Look at Quantifying Tobacco Exposure during Pregnancy Using Fuzzy Clustering. Neurotoxicology and Teratology, 33, 155-165. http://dx.doi.org/10.1016/j.ntt.2010.08.003</mixed-citation></ref><ref id="scirp.57652-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Fang, H., Dukic, V., et al. (2012) Detecting Graded Exposure Effects: A Report on an East Boston Pregnancy Cohort. Nicotine &amp; Tobacco Research, 14, 1115-1120. http://dx.doi.org/10.1093/ntr/ntr272</mixed-citation></ref><ref id="scirp.57652-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bezdek, J.C. and Hathaway, R.J. (1987) Convergence and Theory for Fuzzy c-Means Clustering: Counterexamples and Repairs. IEEE Trans. Pattern Anal., 17, 873-877.</mixed-citation></ref><ref id="scirp.57652-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Chan, K.P. and Cheung, Y.S. (1992) Clustering of Clusters. Pattern Recognition Letters, 25, 211-217.  
http://dx.doi.org/10.1016/0031-3203(92)90102-O</mixed-citation></ref><ref id="scirp.57652-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Pal, N.R. and Bezdek, J.C. (1995) On Cluster Validity for the Fuzzy c-Means Model. IEEE Transactions on Fuzzy Systems, 3, 370-379. http://dx.doi.org/10.1109/91.413225</mixed-citation></ref><ref id="scirp.57652-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Ozkan, I. and Turksen, I.B. (2007) Upper and Lower Values for the Level of Fuzziness in FCM. Information Sciences, 177, 5143-5152. http://dx.doi.org/10.1016/j.ins.2007.06.028</mixed-citation></ref><ref id="scirp.57652-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Huang, M., Xia, Z.X., Wang, H.B., et al. (2012) The Range of the Value for the Fuzzifier of the Fuzzy c-Means Algorithm. Pattern Recognition Letters, 33, 2280-2284. http://dx.doi.org/10.1016/j.patrec.2012.08.014</mixed-citation></ref><ref id="scirp.57652-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Yu, J., Cheng, Q.S. and Huang, H.K. (2004) Analysis of the Weighting Exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34, 634-639. http://dx.doi.org/10.1109/TSMCB.2003.810951</mixed-citation></ref><ref id="scirp.57652-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Schw?mmle, V. and Jensen, O.N. (2010) A Simple and Fast Method to Determine the Parameters for Fuzzy c-Means Cluster Analysis. Bioinformatics, 26, 2841-2848. http://dx.doi.org/10.1093/bioinformatics/btq534 </mixed-citation></ref><ref id="scirp.57652-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Xie, X.L. and Beni, G. (1991) A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 841-847. http://dx.doi.org/10.1109/34.85677</mixed-citation></ref><ref id="scirp.57652-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Rezaee, B. (2010) A Cluster Validity Index for Fuzzy Clustering. Fuzzy Sets and Systems, 161, 3014-3025.  
http://dx.doi.org/10.1016/j.fss.2010.07.005</mixed-citation></ref><ref id="scirp.57652-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">https://archive.ics.uci.edu/ml/datasets/Iris</mixed-citation></ref><ref id="scirp.57652-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Kim, S.S., Kim, S.H., Fang, H., et al. (2014) A Culturally Adapted Smoking Cessation Intervention for Korean Americans: A Mediating Effect of Perceived Family Norm toward Quitting. Journal of Immigrant and Minority Health, 31 May 2014. [Epub ahead of print]. http://dx.doi.org/10.1007/s10903-014-0045-4</mixed-citation></ref></ref-list></back></article>