<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJS</journal-id><journal-title-group><journal-title>Open Journal of Statistics</journal-title></journal-title-group><issn pub-type="epub">2161-718X</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojs.2014.410077</article-id><article-id pub-id-type="publisher-id">OJS-51470</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Regularization and Estimation in Regression with Cluster Variables
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>ingzhao</surname><given-names>Yu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Bin</surname><given-names>Li</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, USA</addr-line></aff><aff id="aff2"><addr-line>Department of Experimental Statistics, Louisiana State University, Baton Rouge, LA, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>qyu@lsuhsc.edu(IY)</email>;<email>bli@lsu.edu(BL)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>18</day><month>11</month><year>2014</year></pub-date><volume>04</volume><issue>10</issue><fpage>814</fpage><lpage>825</lpage><history><date date-type="received"><day>14</day>	<month>October</month>	<year>2014</year></date><date date-type="rev-recd"><day>5</day>	<month>November</month>	<year>2014</year>	</date><date date-type="accepted"><day>15</day>	<month>November</month>	<year>2014</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Clustering Lasso, a new regularization method for linear regressions is proposed in the paper. The Clustering Lasso can select variable while keeping the correlation structures among variables. In addition, Clustering Lasso encourages selection of clusters of variables, so that variables having the same mechanism of predicting the response variable will be selected together in the regression model. A real microarray data example and simulation studies show that Clustering Lasso outperforms Lasso in terms of prediction performance, particularly when there is collinearity among variables and/or when the number of predictors is larger than the number of observations. The Clustering Lasso paths can be obtained using any established algorithm for Lasso solution. An algorithm is proposed to construct variable correlation structures and to compute Clustering Lasso paths efficiently.
 
</p></abstract><kwd-group><kwd>Clustered Variables</kwd><kwd> Lasso</kwd><kwd> Principal Component Analysis</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>We are often interested in finding important variables that are significantly related to the response variable and can be used to predict quantities of interest in regressions and classification problems. Important variables are often shown in clusters where variables in the same cluster are highly correlated and have similar pattern relating to the response variable. For example, a major application of microarray technology is to discover important genes and pathways that are related to clinical outcomes such as the diagnosis of a certain cancer. Typically, only a small proportion of genes from a huge bank have significant influence on the clinical outcome of interest. In addition, expression data frequently have cluster structures: the genes within a cluster often share the same pathway and are therefore similarly related to the outcome. When regression is adapted in this setting, we often face the challenge from multi-collinearity of covariates. An ideal variable selection procedure should be able to find all genes of important clusters rather than just some representative genes from the clusters. Typically, two characteristics, pointed out by [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] , evaluate the quality of a fitted model: accuracy of prediction on new data and interpretation of the model. For the latter, the sparse model with fewer selected covariates is preferred for interpretation due to its simplicity. However, when multiple variables share the same mechanism for explaining the response, all the involved variables should have an equal chance of being selected, and should exhibit the same relationship to the outcome in the fitted model, for scientific reasoning.</p><p>It is well known that the ordinary least square estimate (OLS) in linear regression often performs poorly when some of the predictors are highly correlated. OLS would generate unstable results where the estimates have inflated variances. Regularizations have been proposed to improve OLS. For example, ridge regression [<xref ref-type="bibr" rid="scirp.51470-ref2">2</xref>] penalizes the model complexity by the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x5.png" xlink:type="simple"/></inline-formula> penalty of the coefficients. This method was proposed to solve the collinearity problem by adding a constant to the diagonal terms of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x6.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x7.png" xlink:type="simple"/></inline-formula> is the observation or design matrix. Ridge regression stabilizes the estimates through the bias-variance trade-off. It can often improve the predictions but cannot select variables. [<xref ref-type="bibr" rid="scirp.51470-ref3">3</xref>] proposed the Lasso method by imposing an <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x8.png" xlink:type="simple"/></inline-formula>-penalty on the regre- ssion coefficients. Lasso is a promising method, as it can improve prediction and produce sparse models simultaneously. However, when high correlations among predictors are present, the predictive performance of Lasso is dominated by ridge regression [<xref ref-type="bibr" rid="scirp.51470-ref3">3</xref>] . Moreover, when there is a cluster of variables, in which each variable associates with the response variable similarly, Lasso tends to arbitrarily select one variable from the cluster instead of identifying the cluster [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] ; see also Section 2 for more discussion. Elastic Net, proposed by [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] , combines both <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x9.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x10.png" xlink:type="simple"/></inline-formula> penalties of the coefficients as the regularization criterion. The method is promising in that it encourages cluster effects and shows improved predictive performance over Lasso. Elastic Net can automatically choose cluster variables and estimate parameters at the same time. Many other methods can be used to choose clustered variables, such as principal component analysis (PCA). [<xref ref-type="bibr" rid="scirp.51470-ref4">4</xref>] defined “eigen-arrays” and “eigen-genes” in this way. But PCA can not choose sparse models. [<xref ref-type="bibr" rid="scirp.51470-ref5">5</xref>] proposed sparse principal component analysis (SPCR), which formulated PCA as a regression-type optimization problem, and then obtained sparse loadings by imposing the Elastic Net constraint. SPCR can successfully yield exact zero loadings in principal components. However, for each principal component, a regularization parameter has to be selected, which results in an overwhelming computational burden when the number of parameters is large. Other penalized regression methods have been proposed for group effect [<xref ref-type="bibr" rid="scirp.51470-ref6">6</xref>] - [<xref ref-type="bibr" rid="scirp.51470-ref13">13</xref>] . However, these methods either pre-suppose a grouping structure or assume that each predictor in a group shares an identical regression coefficient.</p><p>In practice, we often have some prior knowledge about the structure of variables and would like to make use of a priori information in analysis. For example, in gene analysis, we know the pathways and genes involved in these pathways. Therefore, we would like to group the involved variables in the same pathway together. Another example is in spatial analysis, we would like to keep a certain correlation structure among the spatial error terms. For example, sometimes we would like to fit a different coefficient for a certain variable at different regions (e.g., if the variable has different effect at different regions) but keep a correlation structure among the coefficients at neighborhood regions. The conditional autoregressive model (CAR, [<xref ref-type="bibr" rid="scirp.51470-ref14">14</xref>] ) is one of the methods that can be used to keep such correlation structure.</p><p>In this paper, we propose a method that encourages cluster variables to be selected together and can incorporate available prior information on coefficient structures in variable selection. When there is no prior information on coefficient structure, we propose a data augmentation algorithm to find the structure. Moreover, the method uses the Lasso regularization to choose sparse models. The proposed method can be solved by any efficient Lasso algorithm such as least angle regression (LARS, [<xref ref-type="bibr" rid="scirp.51470-ref15">15</xref>] ) and the coordinate-wise descent algorithm (CDA, [<xref ref-type="bibr" rid="scirp.51470-ref16">16</xref>] ). We call our method the Clustering Lasso (CL).</p><p>The rest of the paper is organized as follows. In Section 2, we review the Lasso method and discuss its limitation in identifying clustered variables. Then we propose the Clustering Lasso in a Bayesian setting. Its counterparts in the Frequentist setting and computational strategies are discussed in Section 3. Sections 4 and 5 demonstrate the predictive and explanatory performance of CL through real examples and simulations. Finally, conclusions and future work are discussed in Section 6.</p></sec><sec id="s2"><title>2. Clustering Lasso in Bayesian Setting</title><p>Consider linear regression settings with the response vector <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x11.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x12.png" xlink:type="simple"/></inline-formula> dimensional input matrix<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x13.png" xlink:type="simple"/></inline-formula>. The <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x14.png" xlink:type="simple"/></inline-formula> and columns of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x15.png" xlink:type="simple"/></inline-formula> are centered and standardized to have the same <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x16.png" xlink:type="simple"/></inline-formula> norm. The Lasso estimates <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x17.png" xlink:type="simple"/></inline-formula> are calculated by minimizing</p><disp-formula id="scirp.51470-formula195"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x18.png"  xlink:type="simple"/></disp-formula><p>The solution of Lasso can be obtained through LARS or CDA. Compared with ordinary linear regressions, Lasso shows superior predictive performance and more stable estimates. Moreover, Lasso can often select variables and estimate coefficients simultaneously.</p><p>Group effect has been defined by [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] in the linear regression setting. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x19.png" xlink:type="simple"/></inline-formula> be the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x20.png" xlink:type="simple"/></inline-formula>th predictor. The estimates of coefficients have the group effect if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x21.png" xlink:type="simple"/></inline-formula> would result in the estimated coefficients<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x22.png" xlink:type="simple"/></inline-formula>. [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] further proved that if the solution for estimation is to minimize the objective function of the form:</p><disp-formula id="scirp.51470-formula196"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x23.png"  xlink:type="simple"/></disp-formula><p>and the penalty term, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula>, is strictly convex, then the estimates from Equation (2) enjoy the group effect property. In Lasso, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula>is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula> norm of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula>, which is not strictly convex. Zou and Hastie proved that in this case Lasso estimates do not have the group effect. This is also understandable through the Lasso solution path from LARS . In LARS , suppose a variable <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula> is selected in the model. Its coefficient solution path will move in a direction to reduce the correlation between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula> and the current residual, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula>, until another variable, say<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula>, has the same correlation to the current residual as does<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula>. At this point, variable <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula> is added into the model. If <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula> is highly correlated with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x35.png" xlink:type="simple"/></inline-formula>, when the correlation between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x36.png" xlink:type="simple"/></inline-formula> and the residual decreases, so does that between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x37.png" xlink:type="simple"/></inline-formula> and the residual. Therefore, if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x38.png" xlink:type="simple"/></inline-formula> has been included in the model, Lasso is less likely to select the highly correlated variable <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x39.png" xlink:type="simple"/></inline-formula> in the model. Consequently, Lasso cannot select clustered variables.</p><p>In a Bayesian setting, if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x40.png" xlink:type="simple"/></inline-formula> is the ith row of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x40.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x41.png" xlink:type="simple"/></inline-formula>, [<xref ref-type="bibr" rid="scirp.51470-ref3">3</xref>] showed that the Lasso solution is identical to the posterior mode of the coefficients when the prior distributions of the coefficients are set as independent double exponential distributions, where</p><disp-formula id="scirp.51470-formula197"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x42.png"  xlink:type="simple"/></disp-formula><p>In Lasso, the penalty term of model complexity is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x43.png" xlink:type="simple"/></inline-formula>. Because each coefficient is penalized equally, each one can be shrunk to zero independently. When the variables are clustered, an ideal solution path should be that the clustered variables are selected together. Therefore, we would like to penalize the coefficients with a restriction that keeps the correlation structure among the variables. With the penalization, if the coefficient of one variable is nonzero, those variables in the same cluster are less likely to be zero. For this purpose, we assume a correlation structure, specified as the structural correlation matrix<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x44.png" xlink:type="simple"/></inline-formula>, of the coefficients<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x43.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x44.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x45.png" xlink:type="simple"/></inline-formula>.</p><p>For simplicity, assume that the variance of the random error, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x46.png" xlink:type="simple"/></inline-formula>, and the structural correlation matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x48.png" xlink:type="simple"/></inline-formula> are known. Then the likelihood and prior distributions can be set as:</p><disp-formula id="scirp.51470-formula198"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x49.png"  xlink:type="simple"/></disp-formula><p>Therefore, the posterior distribution of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x50.png" xlink:type="simple"/></inline-formula> has the form</p><disp-formula id="scirp.51470-formula199"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x51.png"  xlink:type="simple"/></disp-formula><p>with a vector<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x52.png" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x53.png" xlink:type="simple"/></inline-formula>. The posterior mode of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x54.png" xlink:type="simple"/></inline-formula> in the distribution (3) is the solution to</p><disp-formula id="scirp.51470-formula200"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x55.png"  xlink:type="simple"/></disp-formula><p>Relating Equation (4) to the Bayesian Lasso solution to (1), we naturally infer the Clustering Lasso in Frequentist setting.</p></sec><sec id="s3"><title>3. Clustering Lasso</title><sec id="s3_1"><title>3.1. Clustering Lasso and Its Grouping Effect</title><p>In Frequentist setting, we modify the penalization function in Lasso to retain a presumed correlation structure among coefficients. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x56.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x57.png" xlink:type="simple"/></inline-formula> be the jth element of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x57.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x58.png" xlink:type="simple"/></inline-formula>. The Clustering Lasso estimate is defined as the solution to</p><disp-formula id="scirp.51470-formula201"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x59.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula> is the regularization parameter. Note that instead of restricting<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula>, we restrict<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula>. Therefore,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula>’s are not penalized independently and clustered variables could be chosen. Let<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula>, with dimension<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula>, be the jth row of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x66.png" xlink:type="simple"/></inline-formula> and let<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x67.png" xlink:type="simple"/></inline-formula>, a <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x68.png" xlink:type="simple"/></inline-formula> matrix. The penalty term used in expression (5) can also be written as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x68.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x69.png" xlink:type="simple"/></inline-formula>, which is intermediate between the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x68.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x70.png" xlink:type="simple"/></inline-formula> penalty and the</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x71.png" xlink:type="simple"/></inline-formula>penalty. When <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x71.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x72.png" xlink:type="simple"/></inline-formula> is an identity matrix, the Clustering Lasso is identical to the ordinary Lasso method. Otherwise, the penalty function is strictly convex. Using Lemma 2 developed by [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] , the solution to Expression (5) has the group effect. Therefore, Clustering Lasso can select variables by clusters.</p><p><xref ref-type="fig" rid="fig1">Figure 1</xref> illustrates the Clustering Lasso penalty contours with two predictors. The right figure shows the penalty contour when the two predictors are correlated and the left one shows the contour when the two predictors are independent, which is identical to the Lasso method. The sums of the squared errors have elliptical contours, centered and minimized at the full least squares estimate. The constraint region of Lasso is the diamond region<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x73.png" xlink:type="simple"/></inline-formula>, while that for the Clustering Lasso is the parallelogram region defined by</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x74.png" xlink:type="simple"/></inline-formula>. The optimal estimates are realized at the place where the elliptical contours first hit the</p><p>constraint regions. The sides of the parallelogram are decided by the structural correlation matrix<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x75.png" xlink:type="simple"/></inline-formula>.</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Estimation picture for the Clustering Lasso when two predictors are independent (left, as lasso) and when two predictors are clustered (right)</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-1240413x76.png"/></fig></sec><sec id="s3_2"><title>3.2. Computation</title><p>The Clustering Lasso is an extension of the Lasso method. Let<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x77.png" xlink:type="simple"/></inline-formula>. So the solution to Expression (5) is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x77.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x78.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x77.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x78.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x79.png" xlink:type="simple"/></inline-formula> is</p><disp-formula id="scirp.51470-formula202"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-1240413x80.png"  xlink:type="simple"/></disp-formula><p>Therefore, all the established algorithms for Lasso solution, such as the least angle regression (LARS, [<xref ref-type="bibr" rid="scirp.51470-ref15">15</xref>] ), could be used for Clustering Lasso.</p></sec><sec id="s3_3"><title>3.3. The Clustering Lasso Algorithm</title><p>We can incorporate prior knowledge of clustering into a structural correlation matrix. For example, Kyoto Encyclopedia and Genes and Genomes (KEGG) and many other biological databases can be referred to in gene analysis to construct the structural correlation matrix. It is required that the structural correlation matrix be symmetric. When no prior information is readily adaptable, a natural method is to use the modified correlation matrix of the observed data, meaning that the coefficients should have a correlation structure that is similar to how the covariates are correlated. There are several well-established potential choices such as partial correlation matrix [<xref ref-type="bibr" rid="scirp.51470-ref17">17</xref>] . In this paper, we propose to use a modified correlation matrix so that if two variables <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x81.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x82.png" xlink:type="simple"/></inline-formula> are not significantly correlated, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x82.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x83.png" xlink:type="simple"/></inline-formula>, the ith row and jth column element of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x82.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x83.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x84.png" xlink:type="simple"/></inline-formula>, is set to be zero. As the solution</p><p>for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x85.png" xlink:type="simple"/></inline-formula> is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x85.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x86.png" xlink:type="simple"/></inline-formula>, zero elements in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x85.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x86.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x87.png" xlink:type="simple"/></inline-formula> are desired so that when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x85.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x86.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x87.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x88.png" xlink:type="simple"/></inline-formula>s are shrunk to zero, which is possible</p><p>by the Lasso property, some <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x89.png" xlink:type="simple"/></inline-formula>s could also be shrunk to exact zero.</p><p>In detail, we develop Algorithm 1―the Clustering Lasso algorithm. Let<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x91.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x92.png" xlink:type="simple"/></inline-formula>, in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x93.png" xlink:type="simple"/></inline-formula> be three prespecified numbers, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x94.png" xlink:type="simple"/></inline-formula> be a <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x95.png" xlink:type="simple"/></inline-formula> matrix.</p><p>Algorithm 1 Clustering Lasso</p><p>1. For <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x96.png" xlink:type="simple"/></inline-formula></p><p>for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x97.png" xlink:type="simple"/></inline-formula></p><p>do correlation test between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x98.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x98.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x99.png" xlink:type="simple"/></inline-formula>, let</p><disp-formula id="scirp.51470-formula203"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x100.png"  xlink:type="simple"/></disp-formula><p>2. Do eigen decomposition on <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x101.png" xlink:type="simple"/></inline-formula> so that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x102.png" xlink:type="simple"/></inline-formula> and let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x102.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x103.png" xlink:type="simple"/></inline-formula> if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x102.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x103.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x104.png" xlink:type="simple"/></inline-formula> for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x102.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x103.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x104.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x105.png" xlink:type="simple"/></inline-formula>.</p><p>3. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x106.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x106.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x107.png" xlink:type="simple"/></inline-formula>.</p><p>4. Do Lasso on <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x108.png" xlink:type="simple"/></inline-formula> and get the coefficient solution<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x108.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x109.png" xlink:type="simple"/></inline-formula>.</p><p>5. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x110.png" xlink:type="simple"/></inline-formula>is the solution to Clustering Lasso.</p><p>Note that only when some elements of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula> are set to be zero, could <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula>s be shrunk to exact zero when<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula>’s are shrunk to zero by Lasso. A special case is when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula> is a block diagonal matrix. To choose sparse models, we need to identify clusters of covariates, where variables in the same cluster are assumed to be correlated while those from different clusters are independent. For this purpose, there are two shrinkage steps in Algorithm 1. Step (1) shrinks the correlation coefficients to zero if there is no significant correlation between the pair of covariates at the significance level <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula> or if the magnitude of correlation is smaller than a pre-set value<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula>. When two covariates are not correlated, there is little chance that the two variables relate to the response variable with the same underlying pathway. Therefore, the coefficients of the two variables can be estimated independently. Step (2) shrinks some eigen-values of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula> to zero if the corresponding eigenvector explains less than <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula> times the total variance of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula>. The two shrinkage steps cannot guarantee that some elements of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula> be zero. Subjective intervention can help for this purpose. One resolution is to cluster the covariates first and then calculate the correlation matrices for each cluster, which in turn used to build the diagonal blocks of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula>. In addition to building a diagonal block matrix, another resolution is to adapt shrinkage methods in the eigen decomposition process of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula>, so that some loadings of the eigenvectors might degenerate to 0. ScotLASS [<xref ref-type="bibr" rid="scirp.51470-ref18">18</xref>] and sparse principal component analysis (Zou et al., 2006) can serve this purpose. However, these methods require extra computations for each principal component, which brings in high computational costs. The nonzero elements of the jth row of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula> imply that the corresponding covariates belong to the jth cluster. Ideally, their values should be proportional to the contributions of each covariate to the cluster in explaining the outcome. As pointed out by a referee of the paper, clusters in the proposed method are identified by rows of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula> is defined as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula> with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula> being the diagonal matrix of eigenvalues and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula> columns of eigen- vectors of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x129.png" xlink:type="simple"/></inline-formula>. As in principal component analysis, the nonzero elements of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x129.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x130.png" xlink:type="simple"/></inline-formula> are difficult to interpret in practice. The referee recommends setting the elements of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x129.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x131.png" xlink:type="simple"/></inline-formula> to be 0 or 1 based on the absence or presence of non-zero elements, respectively. In the paper, we set the estimate of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x129.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x131.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x132.png" xlink:type="simple"/></inline-formula> to be zero, if its estimated value is very close to zero, i.e. if<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x119.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x126.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x129.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x131.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x132.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x133.png" xlink:type="simple"/></inline-formula>.</p></sec><sec id="s3_4"><title>3.4. Choice of Tuning Parameters</title><p>Four parameters, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula>, are to be chosen for Algorithm 1. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula>is the significance level used to decide whether the correlations between a pair of covariates should be considered to restrict the estimation of their coefficients. We usually select the significance level at 0.05, the traditional significance level. When the data set is large, we can reduce the significance level. Since the correlation would be always significant when little correlation exists and the number of observations is large, we set another restriction on the magnitude of correlation-<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula>, above which we would like to use the correlation as a restriction to the coefficient parameters. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x137.png" xlink:type="simple"/></inline-formula>is chosen subjectively by researchers. Algorithm 1 Step (2) is similar to the principal component analysis except that the eigen decomposition is based on the correlation matrix modified by Step (1). <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x137.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x138.png" xlink:type="simple"/></inline-formula>specifies the minimum proportion of variance explained by the eigen vector, below that, the eigen vector will not be used for further analysis. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x137.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x139.png" xlink:type="simple"/></inline-formula>is set at a small value, typically<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x137.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x140.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x137.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x141.png" xlink:type="simple"/></inline-formula> is the total number of covariates.</p><p>The last parameter to be tuned is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula>. In Lasso, the conventional tuning parameter is the fraction <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula> of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula>-norm. There are well-established methods for choosing<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x145.png" xlink:type="simple"/></inline-formula>. Tenfold cross-validation (CV) on training data is the method we used in this paper. The training dataset is divided into ten folds randomly. One fold of the data is used as validation data, on which the prediction error is calculated based on the model fitted from the other nine folds of data. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x146.png" xlink:type="simple"/></inline-formula>is tested on a fine grid on<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x147.png" xlink:type="simple"/></inline-formula>. It takes the value that minimizes the averaged prediction error from CV. We can also use ten-fold CV to tune <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x147.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x148.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x147.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x148.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x149.png" xlink:type="simple"/></inline-formula>. We found that only a few representative values for</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x150.png" xlink:type="simple"/></inline-formula>need to be cross validated to obtain good results, which are<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x150.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x151.png" xlink:type="simple"/></inline-formula>.</p></sec></sec><sec id="s4"><title>4. Microarray Data Example</title><p>We used the proposed method on an Affymetrix gene expression dataset. The data were collected by Singh et al. [<xref ref-type="bibr" rid="scirp.51470-ref19">19</xref>] and consists of 12,600 genes, from 52 prostate cancer tumor samples and 50 normal prostate tissue samples. The goal is to construct a diagnostic rule based on the 12,600 gene expressions to predict the occurrence of prostate cancer. Support vector machine (SVM, [<xref ref-type="bibr" rid="scirp.51470-ref20">20</xref>] ), Ridge, Lasso, Elastic Net, Weighted Fusion (w.fusion) and Clustering Lasso were all applied to this dataset. We tried four types of Clustering Lasso methods:</p><p>1. CL1:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x152.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x152.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x153.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x152.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x153.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x154.png" xlink:type="simple"/></inline-formula>;</p><p>2. CL2:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x155.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x155.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x156.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x155.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x156.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x157.png" xlink:type="simple"/></inline-formula>;</p><p>3. CL3:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x158.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x158.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x159.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x158.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x160.png" xlink:type="simple"/></inline-formula>;</p><p>4. CL4:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x161.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x162.png" xlink:type="simple"/></inline-formula>, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x162.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x163.png" xlink:type="simple"/></inline-formula>.</p><p>To apply these methods, we first coded the presence of prostate cancer as a 0-1 (no and yes) response<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x164.png" xlink:type="simple"/></inline-formula>. The classification function is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x164.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x165.png" xlink:type="simple"/></inline-formula> (fitted value<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x164.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x165.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x166.png" xlink:type="simple"/></inline-formula>), where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x164.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x165.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x166.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x167.png" xlink:type="simple"/></inline-formula> is the indicator function. For comparison, we randomly select 52 samples as training data, based on which the diagnostic rules are constructed, and the rules are in turn tested on the remaining 50 samples.</p><p>The dataset was split 20 times. For each repetition, a 1000-gene set was preselected based on the training data to make the computation manageable. The genes are those “most significantly” related to the response, tested by individual t-statistics. <xref ref-type="fig" rid="fig2">Figure 2</xref> shows the boxplots of the misclassification rates on the test data sets from different classifiers. The misclassification rates are summarized in <xref ref-type="table" rid="table1">Table 1</xref>. Overall, the misclassification rate from Clustering Lasso is competitive with Elastic Net and Ridge, and is better than Lasso, Weighted Fusion, and SVM. For the computational time, Clustering Lasso is comparable to the Lasso method and is much more efficient than Elastic Net and Weighted Fusion. Within the four Clustering Lasso methods, the ones with more restrictions on eigenvalues and the magnitudes of correlations perform a little bit worse.</p><p><xref ref-type="table" rid="table2">Table 2</xref> shows the average number of genes selected from the 20 repetitions based on different methods. The analyses were based on 1000 genes and 52 observations. We see that Lasso selected fewer than 52 genes. Elastic Net eliminated few genes―the average number of selected genes was close to 1000. Cluster Lasso identified about 25% genes as important. However, we do not know whether the chosen genes are, in fact, important or not. The efficiency of variable selection is further assessed by simulation studies.</p><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Misclassification rates on singh data. ELAS stands for Elastic Net</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-1240413x168.png"/></fig><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Summary of Misclassification Rates on Singh data</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >SVM</th><th align="center" valign="middle" >Ridge</th><th align="center" valign="middle" >Elastic Net</th><th align="center" valign="middle" >Lasso</th><th align="center" valign="middle" >CL1</th><th align="center" valign="middle" >CL2</th><th align="center" valign="middle" >CL3</th><th align="center" valign="middle" >CL4</th><th align="center" valign="middle" >W. fusion</th></tr></thead><tr><td align="center" valign="middle" >Mean</td><td align="center" valign="middle" >5.75</td><td align="center" valign="middle" >4.25</td><td align="center" valign="middle" >4.3</td><td align="center" valign="middle" >6.05</td><td align="center" valign="middle" >4.45</td><td align="center" valign="middle" >4.2</td><td align="center" valign="middle" >4.2</td><td align="center" valign="middle" >4.55</td><td align="center" valign="middle" >8.2</td></tr><tr><td align="center" valign="middle" >Median</td><td align="center" valign="middle" >6</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >6</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >3.5</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >7.5</td></tr><tr><td align="center" valign="middle" >SD</td><td align="center" valign="middle" >1.48</td><td align="center" valign="middle" >1.33</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1.79</td><td align="center" valign="middle" >1.64</td><td align="center" valign="middle" >1.74</td><td align="center" valign="middle" >1.64</td><td align="center" valign="middle" >2.86</td><td align="center" valign="middle" >4.03</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Average number of genes selected by each method</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Elastic Net</th><th align="center" valign="middle" >Lasso</th><th align="center" valign="middle" >CL1</th><th align="center" valign="middle" >CL2</th><th align="center" valign="middle" >CL3</th><th align="center" valign="middle" >CL4</th><th align="center" valign="middle" >W. fusion</th></tr></thead><tr><td align="center" valign="middle" ># of genes</td><td align="center" valign="middle" >999.25</td><td align="center" valign="middle" >42.25</td><td align="center" valign="middle" >278.35</td><td align="center" valign="middle" >221.50</td><td align="center" valign="middle" >287.05</td><td align="center" valign="middle" >160.40</td><td align="center" valign="middle" >856.65</td></tr></tbody></table></table-wrap></sec><sec id="s5"><title>5. Simulation Studies</title><p>We applied Clustering Lasso on some simulations to test its prediction accuracy in regressions when compared with Ridge, Lasso, Elastic Net, and Weighted Fusion. The first three simulations are adapted from the Elastic Net paper [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] . To begin, datasets are simulated from the true model:</p><disp-formula id="scirp.51470-formula204"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x169.png"  xlink:type="simple"/></disp-formula><p>For each scenario, we simulated 100 data sets, each consisting of a training data set and an independent test data set. Here are the details of the four scenarios.</p><p>1. In example one, we simulated <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x170.png" xlink:type="simple"/></inline-formula> observations as training data and 200 observations as test data. We let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x171.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x172.png" xlink:type="simple"/></inline-formula>. The pairwise correlation between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x172.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x173.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x172.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x173.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x174.png" xlink:type="simple"/></inline-formula></p><p>was set to be<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x175.png" xlink:type="simple"/></inline-formula>.</p><p>2. In Example two, we simulated 200 training data and 400 testing data. There are 40 predictors such that</p><disp-formula id="scirp.51470-formula205"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x176.png"  xlink:type="simple"/></disp-formula><p>3. Example 3 has the group setting that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x177.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x178.png" xlink:type="simple"/></inline-formula>, where the predictors are gene- rated as</p><disp-formula id="scirp.51470-formula206"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x179.png"  xlink:type="simple"/></disp-formula><p>As explained by [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] , three groups are equally important groups, and each group contains five covariates. We created <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x180.png" xlink:type="simple"/></inline-formula> observations as training data and 400 as testing data.</p><p>The fourth simulation is a modification of the third example to emphasize the group effects. The true model has the form <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x181.png" xlink:type="simple"/></inline-formula> where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x182.png" xlink:type="simple"/></inline-formula>. The predictors we observed are</p><disp-formula id="scirp.51470-formula207"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x183.png"  xlink:type="simple"/></disp-formula><p>The latent variables, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula>and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula>, directly relate to the response variable, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula> is more important than<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula>. A nuisance variable<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula>, does not related to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula>relate to <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula> at different levels. In terms of gene analysis, we can think of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x192.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x193.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x194.png" xlink:type="simple"/></inline-formula> as underlying pathways, some of which are related to the disease measured by<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x195.png" xlink:type="simple"/></inline-formula>. We observed the gene expression levels, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x188.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x196.png" xlink:type="simple"/></inline-formula>, and would like to identify the related pathways.</p><p>We used all four Clustering Lasso methods. In all examples, the results from the four Clustering Lasso methods are close to each other. The prediction results from Lasso, CL2, CL4, Elastic Net, Ridge, and Weighted Fusion are summarized in <xref ref-type="table" rid="table3">Table 3</xref> and <xref ref-type="fig" rid="fig3">Figure 3</xref>. In <xref ref-type="fig" rid="fig3">Figure 3</xref>, relative MSE was defined as the MSE of the corresponding method divided by the minimum MSEs from all the methods. We see that Clustering Lasso always performs better than the Lasso method, and it is close to or better than Ridge, Weighted Fusion and Elastic Nets, even under collinearity and group effect situations.</p><p><xref ref-type="table" rid="table4">Table 4</xref> shows the results of variable selection. The two numbers in each cell are the proportion of times an important factor is chosen and the proportion of times a false factor is chosen, respectively. We see that compared with Elastic Net, Weighted Fusion and Lasso, Clustering Lasso is superior at selecting important factors. However, like Weighted Fusion, it is more likely to over select variables than Elastic Net. In Example 2,</p><fig-group id="fig3"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Comparing the simulation results from the four examples. (a)-(d): Example 1-4.</title></caption><fig id ="fig3_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-1240413x197.png"/></fig><fig id ="fig3_2"><label>(c)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-1240413x198.png"/></fig></fig-group><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Mean (standard deviation) of MSE for the simulated examples based on the 100 iterations</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Example 1</th><th align="center" valign="middle" >Example 2</th><th align="center" valign="middle" >Example 3</th><th align="center" valign="middle" >Example 4</th></tr></thead><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >11.50 (1.81)</td><td align="center" valign="middle" >256.40 (19.05)</td><td align="center" valign="middle" >279.75 (30.08)</td><td align="center" valign="middle" >1.151 (0.09)</td></tr><tr><td align="center" valign="middle" >Elastic Net</td><td align="center" valign="middle" >11.23 (1.60)</td><td align="center" valign="middle" >251.01 (18.85)</td><td align="center" valign="middle" >248.42 (24.45)</td><td align="center" valign="middle" >1.138 (0.10)</td></tr><tr><td align="center" valign="middle" >Ridge regression</td><td align="center" valign="middle" >10.55 (1.46)</td><td align="center" valign="middle" >243.47 (15.91)</td><td align="center" valign="middle" >278.09 (25.81)</td><td align="center" valign="middle" >1.125 (0.10)</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 2</td><td align="center" valign="middle" >10.68 (1.46)</td><td align="center" valign="middle" >253.75 (19.11)</td><td align="center" valign="middle" >265.33 (28.82)</td><td align="center" valign="middle" >1.097 (0.08)</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 4</td><td align="center" valign="middle" >10.70 (1.35)</td><td align="center" valign="middle" >250.82 (17.78)</td><td align="center" valign="middle" >257.33 (23.46)</td><td align="center" valign="middle" >1.094 (0.08)</td></tr><tr><td align="center" valign="middle" >Weighted Fusion</td><td align="center" valign="middle" >10.68 (1.69)</td><td align="center" valign="middle" >257.07 (24.22)</td><td align="center" valign="middle" >287.85 (61.14)</td><td align="center" valign="middle" >1.141 (0.09)</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Variable selection results for the simulated examples based on the 100 iterations. In each cell, the first number is the proportion of times a true factor is chosen and the second number is the proportion of times a false factor is chosen</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Example 1</th><th align="center" valign="middle" >Example 2</th><th align="center" valign="middle" >Example 3</th><th align="center" valign="middle" >Example 4</th></tr></thead><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.840, -</td><td align="center" valign="middle" >0.811, 0.389</td><td align="center" valign="middle" >0.235, 0.736</td><td align="center" valign="middle" >0.544, 0.186</td></tr><tr><td align="center" valign="middle" >Elastic Net</td><td align="center" valign="middle" >0.870, -</td><td align="center" valign="middle" >0.838, 0.488</td><td align="center" valign="middle" >0.958, 0.134</td><td align="center" valign="middle" >0.585, 0.124</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 2</td><td align="center" valign="middle" >0.995, -</td><td align="center" valign="middle" >1.00, 0.998</td><td align="center" valign="middle" >1.00, 0.873</td><td align="center" valign="middle" >0.991, 0.630</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 4</td><td align="center" valign="middle" >0.985, -</td><td align="center" valign="middle" >1.00, 0.997</td><td align="center" valign="middle" >1.00, 0.493</td><td align="center" valign="middle" >0.995, 0.460</td></tr><tr><td align="center" valign="middle" >Weighted Fusion</td><td align="center" valign="middle" >0.990, -</td><td align="center" valign="middle" >0.992, 0.975</td><td align="center" valign="middle" >1.00, 0.997</td><td align="center" valign="middle" >0.792, 0.574</td></tr></tbody></table></table-wrap><p>since all variables are highly correlated, Clustering Lasso cannot identify the most important variables. In comparison, Clustering Lasso performs very well in Examples 3 and 4, when clusters of variables play an important role in real model.</p><p>Finally, to show how Clustering Lasso chooses covariates in groups and the behavior of the coefficients for the selected variables, we illustrate the differences between Lasso and Clustering Lasso by a modified example from [<xref ref-type="bibr" rid="scirp.51470-ref1">1</xref>] . Let<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x199.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x200.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x201.png" xlink:type="simple"/></inline-formula> be three independent variables with the uniform <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x202.png" xlink:type="simple"/></inline-formula> distribution. The</p><p>response variable is generated as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x203.png" xlink:type="simple"/></inline-formula>. With the random error terms<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x203.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x204.png" xlink:type="simple"/></inline-formula>, the nine</p><p>observed predictors are</p><disp-formula id="scirp.51470-formula208"><graphic  xlink:href="http://html.scirp.org/file/2-1240413x205.png"  xlink:type="simple"/></disp-formula><p>The variables<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula> are from group 1, with the direct effect<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula>.<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula> are from group 2, with the direct effect<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula>. The effect from <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula> on <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula> is much smaller than from<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula>―the coefficient for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula> is 1 compared with 0.2 for<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x218.png" xlink:type="simple"/></inline-formula>. Variables<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x218.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x219.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x218.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x219.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x220.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x218.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x219.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x220.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x221.png" xlink:type="simple"/></inline-formula> are from<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x209.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x212.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x213.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x214.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x218.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x219.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x220.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x221.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x222.png" xlink:type="simple"/></inline-formula>, which does not relate to the response variable. The within-group correlations are almost 1, while the between group correlations are almost 0. <xref ref-type="fig" rid="fig4">Figure 4</xref> shows the solution paths for Lasso, Elastic Net and CL2.</p><p>We also use this simulation to compare the sensitivity and specificity of the listed methods in finding significant covariates. The simulation is repeated 100 times. <xref ref-type="table" rid="table5">Table 5</xref> summarizes the number of times that the coefficients of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x223.png" xlink:type="simple"/></inline-formula> are not zero. We find that the proposed Clustering Lasso of all versions can uniformly identify the important covariates while is less likely to select non-significant covariates than Lasso, Elastic Net and Weighted Fusion.</p><fig id="fig4"  position="float"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Comparing the solution paths from Lasso, Elastic Net and Clustering Lasso</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-1240413x224.png"/></fig><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Number of times the coefficients are not zero based on the 100 repetitions</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x225.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x226.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x227.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x228.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x229.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x230.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x231.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x232.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-1240413x233.png" xlink:type="simple"/></inline-formula></th></tr></thead><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >86</td><td align="center" valign="middle" >84</td><td align="center" valign="middle" >89</td><td align="center" valign="middle" >69</td><td align="center" valign="middle" >75</td><td align="center" valign="middle" >64</td><td align="center" valign="middle" >73</td><td align="center" valign="middle" >61</td><td align="center" valign="middle" >66</td></tr><tr><td align="center" valign="middle" >Elastic Net</td><td align="center" valign="middle" >93</td><td align="center" valign="middle" >93</td><td align="center" valign="middle" >94</td><td align="center" valign="middle" >88</td><td align="center" valign="middle" >91</td><td align="center" valign="middle" >85</td><td align="center" valign="middle" >40</td><td align="center" valign="middle" >42</td><td align="center" valign="middle" >37</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 1</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >58</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >61</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 2</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >32</td><td align="center" valign="middle" >31</td><td align="center" valign="middle" >30</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 3</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >33</td><td align="center" valign="middle" >35</td><td align="center" valign="middle" >32</td></tr><tr><td align="center" valign="middle" >Clustering Lasso 4</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >33</td><td align="center" valign="middle" >33</td><td align="center" valign="middle" >33</td></tr><tr><td align="center" valign="middle" >Weighted Fusion</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >99</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >95</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >85</td><td align="center" valign="middle" >83</td><td align="center" valign="middle" >85</td></tr></tbody></table></table-wrap></sec><sec id="s6"><title>6. Conclusions and Future Works</title><p>We find that the Clustering Lasso, is a novel predictive model that produces sparse model with good predictive performance, while encouraging group effects. The empirical results from a two-class microarray data classifica- tion problem and several simulation studies on regression problems show that Clustering Lasso has very good predictive performance and is superior to the Lasso method.</p><p>The method was proposed to encourage group effects so that clustered variables are selected together in a model. Clustering Lasso can automatically select groups of variables. If the structural correlation matrix used for regularization is block diagonal matrix, Clustering Lasso is equivalent to the group Lasso proposed by [<xref ref-type="bibr" rid="scirp.51470-ref7">7</xref>] . However, if the relationships among the variables are complicated, we have to simplify the structural correlation matrix to obtain sparse models. We proposed some shrinkage steps to build the desired structural correlation matrix. Rotating the eigen vectors or adapting techniques such as sparse component analysis can also help for this purpose. As a next step, we will use the Clustering Lasso method in the spatial analysis, so that we can maintain the important spatial correlations while selecting sparse models.</p></sec><sec id="s7"><title>Acknowledgements</title><p>We thank Mrs. Patricia Andrews for editing the paper.</p></sec></body><back><ref-list><title>References</title><ref id="scirp.51470-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Application to Nonorthogonal Problems. Technometrics, 12, 69-82. http://dx.doi.org/10.1080/00401706.1970.10488635</mixed-citation></ref><ref id="scirp.51470-ref2"><label>2</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Tibshirani</surname><given-names> R. </given-names></name>,<etal>et al</etal>. (<year>1996</year>)<article-title>Regression Shrinkage and Selection via the Lasso</article-title><source> Journal of the Royal Statistical Society: Series B</source><volume> 58</volume>,<fpage> 267</fpage>-<lpage>288</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.51470-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B, 67, 301-320. http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x</mixed-citation></ref><ref id="scirp.51470-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Alter, O., Brown, P. and Botstein, D. (2000) Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling. Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106.</mixed-citation></ref><ref id="scirp.51470-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Zou, H., Hastie, T. and Tibshirani, R. (2006) Sparse Principal Component Analysis. Journal of Computational and Graphical Statistics, 15, 265-286. http://dx.doi.org/10.1198/106186006X113430</mixed-citation></ref><ref id="scirp.51470-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005) Sparsity and Smoothness via the Fused Lasso. Journal of the Royal Statistical Society: Series B, 67, 91-108. http://dx.doi.org/10.1111/j.1467-9868.2005.00490.x</mixed-citation></ref><ref id="scirp.51470-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Yuan, M. and Lin, Y. (2006) Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society: Series B, 68, 49-67. http://dx.doi.org/10.1111/j.1467-9868.2005.00532.x</mixed-citation></ref><ref id="scirp.51470-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Bondell, H.D. and Reich, B.J. (2008) Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR. Biometrics, 64, 115-123. http://dx.doi.org/10.1111/j.1541-0420.2007.00843.x</mixed-citation></ref><ref id="scirp.51470-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Daye, Z.J. and Jeng, X.J. (2009) Shrinkage and Model Selection with Correlated Variables via Weighted Fusion. Computational Statistics &amp; Data Analysis, 53, 1284-1298. http://dx.doi.org/10.1016/j.csda.2008.11.007</mixed-citation></ref><ref id="scirp.51470-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Jenatton, R., Obozinski, G. and Bach, F. (2010) Structured Sparse Principal Component Analysis. International Conference on Artificial Intelligence and Statistics (AISTATS).</mixed-citation></ref><ref id="scirp.51470-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Jenatton, R., Audibert, J.Y. and Bach, F. (2011) Structured Variable Selection with Sparsity-Inducing Norms. Journal of Machine Learning Research, 12, 2777-2824.</mixed-citation></ref><ref id="scirp.51470-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Huang, J., Ma, S. and Zhang, C.H. (2011) The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression. Annals of Statistics, 39, 2021-2046. http://dx.doi.org/10.1214/11-AOS897</mixed-citation></ref><ref id="scirp.51470-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Buhlmann, P., Rutimann, P., van de Geer, S. and Zhang, C.H. (2013) Correlated Variables in Regression: Clustering and Sparse Estimation. Journal of Statistical Planning and Inference, 143, 1835-1858. 
http://dx.doi.org/10.1016/j.jspi.2013.05.019</mixed-citation></ref><ref id="scirp.51470-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Besag, J. (1974) Spatial Interaction and the Statistical Analysis of Lattice Systems (with Discussion). Journal of the Royal Statistical Society, Series B, 36, 192-236.</mixed-citation></ref><ref id="scirp.51470-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2004) Least Angel Regression. Annals of Statistics, 32, 407-499. 
http://dx.doi.org/10.1214/009053604000000067</mixed-citation></ref><ref id="scirp.51470-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007) Pathwise Coordinate Optimization. Annals of Applied Statistics, 1, 302-332. http://dx.doi.org/10.1214/07-AOAS131</mixed-citation></ref><ref id="scirp.51470-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Schafer, J. and Strimmer, K. (2005) A Shrinkage Approach to Large-Scale Covariance Estimation and Implications for Functional Genomics. Statistical Applications in Genetics and Molecular Biology, 4, 32.</mixed-citation></ref><ref id="scirp.51470-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Jolliffe, I.T., Trendafilov, N.T. and Uddin, M. (2003) A Modified Principal Component Technique Based on the Lasso. Journal of Computational and Graphical Statistics, 12, 531-547. http://dx.doi.org/10.1198/1061860032148</mixed-citation></ref><ref id="scirp.51470-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Singh, D., Febbo, P., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamaryo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R. and Sellers, W.R. (2002) Gene Expression Correlates of Clinical Prostate Cancer Behavior. Cancer Cell, 1, 203-209. http://dx.doi.org/10.1016/S1535-6108(02)00030-2</mixed-citation></ref><ref id="scirp.51470-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Guyon, I., Weston, J., Barnhill, S. and Vaapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422. http://dx.doi.org/10.1023/A:1012487302797</mixed-citation></ref></ref-list></back></article>