<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2020.810172</article-id><article-id pub-id-type="publisher-id">JAMP-103989</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Inconsistency of Classical Penalized Likelihood Approaches under Endogeneity
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yawei</surname><given-names>He</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>Department of Mathematics and Statistics, Chongqing Jiaotong University, Chongqing, China</addr-line></aff><pub-date pub-type="epub"><day>30</day><month>09</month><year>2020</year></pub-date><volume>08</volume><issue>10</issue><fpage>2335</fpage><lpage>2343</lpage><history><date date-type="received"><day>29,</day>	<month>September</month>	<year>2020</year></date><date date-type="rev-recd"><day>27,</day>	<month>October</month>	<year>2020</year>	</date><date date-type="accepted"><day>30,</day>	<month>October</month>	<year>2020</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  
    With the high speed development of information technology, contemporary data from a variety of fields becomes extremely large. The number of features in many datasets is well above the sample size and is called high dimensional data. In statistics, variable selection approaches are required to extract the efficacious information from high dimensional data. The most popular approach is to add a penalty function coupled with a tuning parameter to the log likelihood function, which is called penalized likelihood method. However, almost all of penalized likelihood approaches only consider noise accumulation and supurious correlation whereas ignoring the endogeneity which also appeared frequently in high dimensional space. In this paper, we explore the cause of endogeneity and its influence on penalized likelihood approaches. Simulations based on five classical pe-nalized approaches are provided to vindicate their inconsistency under endogeneity. The results show that the positive selection rate of all five approaches increased gradually but the false selection rate does not consistently decrease when endogenous variables exist, that is, they do not satisfy the selection consistency. 
  
 
</p></abstract><kwd-group><kwd>High Dimension</kwd><kwd> Endogeneity</kwd><kwd> Feature Selection</kwd><kwd> Penalized Likelihood</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Along with the rapid progress of information technology and electronics industry, more and more data have been obtained from biomedical, econometrics and other fields. Therefore, in order to extract valid information from mass data, high-dimensional variable selection has been set off in statistics. Variables selection refers to the selection of important variables from the suspicious feature space and the elimination of redundant variables. High dimension indexes the number of variables (features) is much higher than the sample size, and can even reach its exponential order. Compared with traditional data analysis, variable selection in high-dimensional space not only increases the computational burden, but also leads to noise accumulation, spurious correlation and endogeneity [<xref ref-type="bibr" rid="scirp.103989-ref1">1</xref>]. The noise accumulation is mainly due to the accumulation of estimation errors caused by the need to estimate a large number of unknown parameters at the same time during feature selection. To avoid noise accumulation, variable selection often makes a reasonable sparse assumption for the parameters to be evaluated [<xref ref-type="bibr" rid="scirp.103989-ref2">2</xref>]. The suspicious correlation is mainly due to the high sample correlation between high-dimensional variables. When important variables are highly correlated with some redundant variables, these redundant variables are easily selected and make suspicious variables. In this case, penalty function is usually added. This method of adding a penalty function after a log-likelihood function, called the penalized likelihood method, is the most common method for high-dimensional variable selection. Unfortunately, most penalized likelihood methods consider noise accumulation and spurious correlation, but ignore another important factor―endogeneity [<xref ref-type="bibr" rid="scirp.103989-ref3">3</xref>]. This paper studies the influence of endogeneity on the classical penalized likelihood methods, which is divided into three parts. Firstly, it introduces the origin and causes of endogeneity; secondly, it summarizes the classical penalized likelihood method and its development process; finally, comparative analysis is carried out to show the inconsistency of various penalized likelihood approaches under endogeneity.</p></sec><sec id="s2"><title>2. The Origin and Cause of Endogeneity</title><p>The concept of endogeneity originated from economics. Under the linear regression model Y = β 0 + X 1 β 1 + X 2 β 2 + ... + X p β p + ε , it means that some explanatory variables correlates with the residual, namely cov ( X j , ε ) ≠ 0 . The causes of endogeneity in variable selection can be roughly divided into three categories: omitted variables, measurement errors and simultaneous bias. These will be elaborate in detail under the most commonly used linear regression model. Omitted variables mean that some important variables that can affect the response variable Y are omitted in the explanatory variable. If these omitted variables were related to the pre-existing explanatory variables, endogeneity would occur. To be more specific, assuming that the true regression model is Y = β 0 + X 1 β 1 + ... + X k β k + X * β * + ε , but variable X<sub>*</sub> is omitted, and the regression model is mistakenly set as Y = β 0 + X 1 β 1 + ... + X k β k + ε . Therefore the omitted variable actually goes into the error term u, that is, u = Xβ<sub>*</sub>+ε. if X<sub>*</sub> is related to X<sub>j</sub>, then u is related to X<sub>j</sub>, and it would lead to endogeneity. When the measurement of a variable is incomplete, the measurement bias will be included in the error term of the regression equation as a part of the regression bias. The measurement bias comes not only from the error records of variables, but also from the inevitable conceptual differences between the commonly used proxy variable and the real variable, which can be obtained from the explanatory variables and the response variable. For example, suppose the real regression model is Y = β 0 + X 1 β 1 + ... + X k β k + ε and the equation to be estimated is Y * = β 0 + X 1 β 1 + ... + X k β k + u , Y * − Y = t is the measurement error. If the measurement bias t is related to the explanatory variable, endogeneity will occur. In addition to the omitted variables and measurement biases leading to endogeneity, explanatory variables and response variables may also affect each other. That is not a one-way casuality, leading to causal correlation bias but also endogeneity. Take resident income X and resident consumption Y as an example. In general, the interaction between income and consumption, and the process of mutual influence cannot be observed. At this time, the information about X and Y is essentially mixed up. More precisely, Y = β 0 + X β 1 + ε , X = γ 0 + Y γ 1 + u , so cov ( X , ε ) ≠ 0 and endogeneity occurs.</p><p>In the analysis of high-dimensional data, endogeneity is almost inevitable. That is mainly because researchers tend to collect as many potential relevant explanatory variables as possible to avoid omission of important variables when we do not know the real model while these high-dimensional variables are usually aggregated from multiple data sources. Unintentionally, some explanatory variables may be associated with residuals, leading to endogeneity. It can also be said that the more variables, the higher the data dimension, the greater the probability of endogeneity.</p></sec><sec id="s3"><title>3. Penalized Likelihood Method and Its Development</title><p>One of the most popular techniques in statistics for extracting information from large volumes of complex data is the high dimensional variable selection. There are two main goals in variable selection: selection consistency, that is, selecting of important variables accurately with a probability close to 1; prediction accuracy, that is, estimating coefficients as accurately as knowing in advance. An Oracle property is defined if these two goals can be satisfied simultaneously. However, due to the occurrence of over-fitting in high-dimensional space, it is difficult combine the two goals, and the selection consistency is usually considered to be more important. For example, in disease gene mapping, the main concern is which genes are the pathogenic genes and not others.</p><p>In the high dimension linear model, the penalized likelihood method, which adds a penalty function to the log-likelihood function to shrink estimates to trade between variance and bias, is the most common method of variable selection. More specifically, we consider a linear regression model with main effects only, by minimizing the penalized likelihood function ‖ Y − X β ‖ 2 + ∑ p λ ( β j ) , and it's going to produce a certain amount of non-zero coefficients. And their corresponding variables will be the candidate variables. In the penalized likelihood approaches, a variety of penalized functions were selected, including Lasso [<xref ref-type="bibr" rid="scirp.103989-ref4">4</xref>], SCAD [<xref ref-type="bibr" rid="scirp.103989-ref5">5</xref>], Adaptive Lasso (ALasso) [<xref ref-type="bibr" rid="scirp.103989-ref6">6</xref>], MCP [<xref ref-type="bibr" rid="scirp.103989-ref7">7</xref>], Sequential Lasso (SLasso) [<xref ref-type="bibr" rid="scirp.103989-ref8">8</xref>], etc.</p><sec id="s3_1"><title>3.1. Lasso and Improvements</title><p>Lasso was the first to choose the most basic penalized function p λ ( β ) = λ | β | and has been widely cited. It is convenient and easy to compute since its entire regularization path is computed under the complexity of a single linear regression. In a high-dimensional space, the estimation of Lasso is biased, but it satisfies model's selection consistency under conditions like neighborhood stability condition [<xref ref-type="bibr" rid="scirp.103989-ref9">9</xref>], non-representable condition [<xref ref-type="bibr" rid="scirp.103989-ref10">10</xref>], and Mutual Incohorence Condition [<xref ref-type="bibr" rid="scirp.103989-ref11">11</xref>]. However, all of these conditions require weak correlations between non-significant variables and significant variables, which is difficult to achieve in practice. That is, Lasso performs poorly when there is a high correlation between variables. In fact, for a set of variables with a high two-way correlation, Lasso is more likely to select a variable from this set regardless of which one is selected.</p><p>Many classical feature selection methods have been proposed by on the basis of Lasso. Elastic net [<xref ref-type="bibr" rid="scirp.103989-ref12">12</xref>] integrated Lasso with ridge regression by defining p λ ( β ) = λ 1 | β | + λ 2 | β | 2 and it outperforms Lasso in high correlation and prediction accuracy. However, it is easy to cause grouping effect, that is, highly correlated variables are often selected into the model or excluded at the same time. ALasso [<xref ref-type="bibr" rid="scirp.103989-ref6">6</xref>] considers the weighted penalized function p λ ( β j ) = λ w j | β j | and is proved to satisfy both the selection consistency and the prediction accuracy under a reasonable initial estimator. Another significant improvement of Lasso, SLasso [<xref ref-type="bibr" rid="scirp.103989-ref8">8</xref>], takes a stepwise approach to variables selection, but only adds a L1 penalized function to variables which are not selected in previous stage. This can ensure that variables selected in the early stage are not omitted in the subsequent selection process. SLasso also owns the oracle property and is more computationally attractive than approaches like elastic net.</p></sec><sec id="s3_2"><title>3.2. SCAD and Related</title><p>Compared with Lasso, SCAD [<xref ref-type="bibr" rid="scirp.103989-ref5">5</xref>] takes a different approach, resulting in a successful nonconcave penalized function</p><p>P ' λ ( β ) = λ I ( β ≤ λ ) + ( a λ − β ) + I ( β &gt; λ ) / ( a − 1 )</p><p>which has desirable properties on many occasions [<xref ref-type="bibr" rid="scirp.103989-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.103989-ref14">14</xref>] [<xref ref-type="bibr" rid="scirp.103989-ref15">15</xref>]. MCP [<xref ref-type="bibr" rid="scirp.103989-ref7">7</xref>] makes P ' λ ( β ) = ( a λ − β ) + / a to be similar to the penalized function used for SCAD and translates the flat part of the derivative of SCAD penalty to the origin. However, due to the nature of the noncancave penalized function, they are both computationally unadvantageous if compared to the Lasso family.</p></sec><sec id="s3_3"><title>3.3. Tuning Parameter</title><p>In addition to the chosen of penalized function, the determination of tuning parameter λ is also one of the key points of penalized likelihood approaches. If set λ to a set of values, a serious of candidate models are generated. Therefore, the penalized likelihood method should be used in conjunction with the model selection criteria. The former generates candidate models; the latter decides the optimal model. Classical model selection criteria include AIC [<xref ref-type="bibr" rid="scirp.103989-ref16">16</xref>], BIC [<xref ref-type="bibr" rid="scirp.103989-ref17">17</xref>]. However, these traditional criteria are no longer suitable for high-dimensional space due to the selection of too many useless variables. In order to adapt to the high-dimensional situation, researchers add additional penalized terms after AIC [<xref ref-type="bibr" rid="scirp.103989-ref18">18</xref>] or replace factor 2 with a constant term C [<xref ref-type="bibr" rid="scirp.103989-ref19">19</xref>]. For BIC, more efforts have been devoted to the prior probability modifications, such as modified BIC (mBIC) [<xref ref-type="bibr" rid="scirp.103989-ref20">20</xref>] and extended BIC (EBIC) [<xref ref-type="bibr" rid="scirp.103989-ref21">21</xref>]. By assigning different values to parameters γ, EBIC is essentially a set of criteria. The BIC and mBIC can be regarded as special cases of EBIC by letting γ = 0 and γ = 1. The properties of EBIC under different high-dimensional models have been extensively studied. It is consistent for the linear model [<xref ref-type="bibr" rid="scirp.103989-ref21">21</xref>], the generalized linear model [<xref ref-type="bibr" rid="scirp.103989-ref22">22</xref>], the cox model [<xref ref-type="bibr" rid="scirp.103989-ref23">23</xref>], etc.</p></sec></sec><sec id="s4"><title>4. Inconsistency under Endogeneity</title><p>When using the penalized likelihood method for variables selection, some basic conditions must be met to achieve the desired properties. This includes restrictions on explanatory variables [<xref ref-type="bibr" rid="scirp.103989-ref8">8</xref>] or focus on the explanatory variables and regression coefficients [<xref ref-type="bibr" rid="scirp.103989-ref24">24</xref>] or the restrictions on likelihood function [<xref ref-type="bibr" rid="scirp.103989-ref25">25</xref>]. However, when endogeneity exists, even if there’s only one endogenous variable left, the above necessary conditions are hard to meet. In this case, there will be an insurmountable difference between the estimated value of regression coefficient and the true value, which will affect the selection consistency of these features. Next, we will use a simulation to show the effect of endogeneity.</p><sec id="s4_1"><title>4.1. Specification of Model</title><p>Consider the model Y = Xβ+ε, where ε ~ N(0, I). let sample size be n = 50, 100 and 200 respectively. Define the number of variables p = [n<sup>1.2</sup>] and β j = ( − 1 ) u ( 0.8 + 0.05 u ) , where u follows the two-point distribution with a parameter of 0.5, for j = 1 , 2 , ... , 6 ; β<sub>j</sub> = 0 for j = 7 , 8 , ... , p . Consider two different Settings:</p><p>Setting 1: X j = Z j , j = 1 , 2 , ... , 6 ; X j = Z j ( 1 + 2 ε ) , j = 7 , 8 , ... , p .</p><p>Setting 2: X j = Z j ( 1 + 2 ε ) , j = 1 , 2 , ... , p − 6 ; X j = Z j , j = p − 5 , ... , p .</p><p>The difference between these two settings is that the former only has insignificant variables that are endogenous while the latter are all important variables that are endogenous. Both of them will be compared respectively with the exogenous case that X<sub>j</sub> = Z<sub>j</sub> for all j to reflect the impact of endogeneity. The Z~N (0, ∑) and is independent of ε. The setting of the covariance matrix ∑ considers only two common structures: ∑<sub>ij</sub> = 0.5, i ≠ j, ∑<sub>ij</sub> = 1, i = j and ∑<sub>ij</sub> = 0.5<sup>|i−j|</sup>, which can be called S1 and S2 respectively. The extended Bayesian model selection criterion EBIC is used to select the tunning parameter and determine the optimal model by letting γ = 1 − logn/4logp.</p></sec><sec id="s4_2"><title>4.2. Results and Interpretation</title><p>In the measurement of selection consistency, PDR (number of true selected variables/total number of true variables), FDR (number of false selected variables/total number of selected variables) and Msize (total number of selected variables) are used. Due to the randomness of the explanatory variables and the error term, the above simulation process will be repeated 200 times to take the average value of the measures, and the results are shown in Tables 1-4.</p><p>It can be seen from Tables 1-4 that when there is no endogeneity, PDR tended to 1 with an upward trend, while FDR tended to 0 with a downward trend, and the number of selected variables tended to the number of real variables, although the initial performance of various feature selection methods is different. In other words, the asymptotic consistency of these classical penalized likelihood approaches satisfied. However, when endogeneity exists, either the unimportant</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Results under setting 1 with s1</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >endogeneity</th><th align="center" valign="middle"  colspan="3"  >PDR, FDR (Msize)</th></tr></thead><tr><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.29, 0.71 (4.6)</td><td align="center" valign="middle" >0.38, 0.67 (5.8)</td><td align="center" valign="middle" >0.51, 0.64 (8.2)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.29, 0.73 (6.4)</td><td align="center" valign="middle" >0.45, 0.71 (9.0)</td><td align="center" valign="middle" >0.79, 0.70 (16.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.96, 0.55 (14.0)</td><td align="center" valign="middle" >0.99, 0.60 (17.5)</td><td align="center" valign="middle" >1.00, 0.64 (22.6)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.36, 0.70 (6.2)</td><td align="center" valign="middle" >0.47, 0.68 (8.8)</td><td align="center" valign="middle" >0.82, 0.69 (16.2)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.97, 0.56 (14.5)</td><td align="center" valign="middle" >0.99, 0.64 (18.1)</td><td align="center" valign="middle" >1.00, 0.70 (23.6)</td></tr><tr><td align="center" valign="middle" >exogeneity</td><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.41, 0.25 (3.5)</td><td align="center" valign="middle" >0.81, 0.23 (6.8)</td><td align="center" valign="middle" >0.97, 0.16 (7.4)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.64, 0.46 (10.3)</td><td align="center" valign="middle" >0.97, 0.14 (7.0)</td><td align="center" valign="middle" >1.00, 0.05 (6.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.74, 0.53 (14.6)</td><td align="center" valign="middle" >0.99, 0.20 (11.7)</td><td align="center" valign="middle" >1.00, 0.04 (6.3)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.62, 0.44 (10.2)</td><td align="center" valign="middle" >0.95, 0.12 (6.9)</td><td align="center" valign="middle" >1.00, 0.05 (6.4)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.75, 0.58 (15.2)</td><td align="center" valign="middle" >0.99, 0.25 (14.4)</td><td align="center" valign="middle" >1.00, 0.05 (6.4)</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Results under setting 1 with s2</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >endogeneity</th><th align="center" valign="middle"  colspan="3"  >PDR, FDR (Msize)</th></tr></thead><tr><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.23, 0.73 (3.9)</td><td align="center" valign="middle" >0.33, 0.64 (5.4)</td><td align="center" valign="middle" >0.44, 0.52 (6.3)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.20, 0.75 (4.8)</td><td align="center" valign="middle" >0.29, 0.70 (6.3)</td><td align="center" valign="middle" >0.44, 0.67 (9.9)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.93, 0.58 (15.0)</td><td align="center" valign="middle" >0.99, 0.67 (20.3)</td><td align="center" valign="middle" >0.99, 0.56 (18.1)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.31, 0.72 (4.6)</td><td align="center" valign="middle" >0.39, 0.62 (6.6)</td><td align="center" valign="middle" >0.46, 0.64 (8.6)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.94, 0.59 (15.2)</td><td align="center" valign="middle" >1.00, 0.69 (20.7)</td><td align="center" valign="middle" >1.00, 0.75 (27.3)</td></tr><tr><td align="center" valign="middle" >exogeneity</td><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.49, 0.13 (3.6)</td><td align="center" valign="middle" >0.78, 0.14 (5.9)</td><td align="center" valign="middle" >0.96, 0.13 (7.0)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.58, 0.33 (7.7)</td><td align="center" valign="middle" >0.90, 0.17 (6.9)</td><td align="center" valign="middle" >1.00, 0.06 (6.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.71, 0.56 (11.3)</td><td align="center" valign="middle" >0.88, 0.25 (10.1)</td><td align="center" valign="middle" >1.00, 0.09 (6.7)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.56, 0.28 (6.8)</td><td align="center" valign="middle" >0.89, 0.15 (6.8)</td><td align="center" valign="middle" >1.00, 0.06 (6.4)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.73, 0.66 (18.1)</td><td align="center" valign="middle" >0.96, 0.27 (12.9)</td><td align="center" valign="middle" >1.00, 0.06 (6.7)</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Results under setting 2 with s1</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >endogeneity</th><th align="center" valign="middle"  colspan="3"  >PDR, FDR (Msize)</th></tr></thead><tr><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.78, 0.58 (11.4)</td><td align="center" valign="middle" >0.89, 0.54 (12.0)</td><td align="center" valign="middle" >0.95, 0.42 (10.7)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.77, 0.70 (16.3)</td><td align="center" valign="middle" >0.99, 0.62 (16.4)</td><td align="center" valign="middle" >1.00, 0.64 (17.3)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.78, 0.47 (9.6)</td><td align="center" valign="middle" >0.99, 0.37 (10.6)</td><td align="center" valign="middle" >1.00, 0.38 (12.0)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.72, 0.65 (14.7)</td><td align="center" valign="middle" >0.94, 0.56 (13.7)</td><td align="center" valign="middle" >1.00, 0.52 (14.8)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.74, 0.50 (9.6)</td><td align="center" valign="middle" >0.99, 0.40 (11.0)</td><td align="center" valign="middle" >1.00, 0.41 (14.8)</td></tr><tr><td align="center" valign="middle" >exogeneity</td><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.41, 0.25 (3.5)</td><td align="center" valign="middle" >0.81, 0.23 (6.8)</td><td align="center" valign="middle" >0.97, 0.16 (7.4)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.64, 0.46 (10.3)</td><td align="center" valign="middle" >0.97, 0.14 (7.0)</td><td align="center" valign="middle" >1.00, 0.05 (6.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.74, 0.53 (14.6)</td><td align="center" valign="middle" >0.99, 0.20 (11.7)</td><td align="center" valign="middle" >1.00, 0.04 (6.3)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.62, 0.41 (10.2)</td><td align="center" valign="middle" >0.95, 0.12 (6.9)</td><td align="center" valign="middle" >1.00, 0.06 (6.4)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.75, 0.60 (16.6)</td><td align="center" valign="middle" >0.99, 0.25 (14.4)</td><td align="center" valign="middle" >1.00, 0.05 (6.3)</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Results under setting 2 with s2</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >endogeneity</th><th align="center" valign="middle"  colspan="3"  >PDR, FDR (Msize)</th></tr></thead><tr><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.75, 0.54 (10.9)</td><td align="center" valign="middle" >0.85, 0.51 (11.7)</td><td align="center" valign="middle" >0.96, 0.42 (11.0)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.64, 0.71 (14.8)</td><td align="center" valign="middle" >0.90, 0.65 (16.1)</td><td align="center" valign="middle" >0.99, 0.62 (16.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.76, 0.52 (10.6)</td><td align="center" valign="middle" >0.99, 0.40 (11.1)</td><td align="center" valign="middle" >1.00, 0.40 (10.7)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.69, 0.67 (13.3)</td><td align="center" valign="middle" >0.88, 0.62 (15.2)</td><td align="center" valign="middle" >0.98, 0.58 (15.8)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.75, 0.55 (10.7)</td><td align="center" valign="middle" >0.98, 0.50 (12.6)</td><td align="center" valign="middle" >1.00, 0.54 (14.7)</td></tr><tr><td align="center" valign="middle" >exogeneity</td><td align="center" valign="middle" >n = 50</td><td align="center" valign="middle" >n = 100</td><td align="center" valign="middle" >n = 200</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >0.50, 0.13 (3.6)</td><td align="center" valign="middle" >0.78, 0.14 (5.9)</td><td align="center" valign="middle" >0.97, 0.13 (7.0)</td></tr><tr><td align="center" valign="middle" >SLasso</td><td align="center" valign="middle" >0.58, 0.33 (7.7)</td><td align="center" valign="middle" >0.90, 0.17 (6.9)</td><td align="center" valign="middle" >1.00, 0.06 (6.4)</td></tr><tr><td align="center" valign="middle" >SCAD</td><td align="center" valign="middle" >0.71, 0.56 (16.3)</td><td align="center" valign="middle" >0.88, 0.25 (10.1)</td><td align="center" valign="middle" >1.00, 0.08 (6.7)</td></tr><tr><td align="center" valign="middle" >ALasso</td><td align="center" valign="middle" >0.57, 0.29 (6.9)</td><td align="center" valign="middle" >0.90, 0.15 (6.9)</td><td align="center" valign="middle" >1.00, 0.06 (6.5)</td></tr><tr><td align="center" valign="middle" >MCP</td><td align="center" valign="middle" >0.74, 0.66 (18.1)</td><td align="center" valign="middle" >0.96, 0.27 (13.0)</td><td align="center" valign="middle" >1.00, 0.06 (6.7)</td></tr></tbody></table></table-wrap><p>endogenous variables or important endogenous variables, as the sample size increases, all approaches are selected to rate though there is a rising trend for PDR but not necessarily obvious. The performance of FDR and number of selected variables is not as expected by their asymptotic nature; it’s still picking the wrong variables, which means it is no longer valid in the presence of endogeneity. In addition, these tables showed the difference in the robustness between the above penalized likelihood methods. When switching from exgenous to exogenous, SCAD is the most robust and SLasso is the lowest robust, which suggests some implications for subsequent endogenous feature selection studies.</p></sec></sec><sec id="s5"><title>Acknowledgements</title><p>This project is supported by National Natural Science Foundation of China (Grant No: 11701058).</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The author declares no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>He, Y.W. (2020) Inconsistency of Classical Penalized Likelihood Approaches under Endogeneity. Journal of Applied Mathematics and Physics, 8, 2335-2343. https://doi.org/10.4236/jamp.2020.810172</p></sec></body><back><ref-list><title>References</title><ref id="scirp.103989-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J. (2014) Challenges of Big Data analysis. National Science Review, 1, 293-314. 
https://doi.org/10.1093/nsr/nwt032</mixed-citation></ref><ref id="scirp.103989-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Donoho, D. (2000) High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. The American Mathematical Society Conference, Los Angeles, CA, United States, 7-12 August 2000.</mixed-citation></ref><ref id="scirp.103989-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Engle, R., Hendry, D. and Richard, J.-F. (1983) Exogeneity. Econometrica, 51, 277-304. https://doi.org/10.2307/1911990</mixed-citation></ref><ref id="scirp.103989-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267-288.  
https://doi.org/10.1111/j.2517-6161.1996.tb02080.x</mixed-citation></ref><ref id="scirp.103989-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273</mixed-citation></ref><ref id="scirp.103989-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101, 1418-1429.  
https://doi.org/10.1198/016214506000000735</mixed-citation></ref><ref id="scirp.103989-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, C.H. (2010) Nearly Unbiased Variable Selection under Minimax Concave Penalty. The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729</mixed-citation></ref><ref id="scirp.103989-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Luo, S. and Chen, Z. (2011) Sequential Lasso for Feature Selection with Ultra-High Dimensional Feature Space. Journal of the American Statistical Association, 109, 1229-1240. https://doi.org/10.1080/01621459.2013.877275</mixed-citation></ref><ref id="scirp.103989-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Meinshausen, N. and Bühlmann, P. (2006) High-Dimensional Graphs and Variable Selection with the Lasso. Annals of Statistics, 34, 1436-1462.  
https://doi.org/10.1214/009053606000000281</mixed-citation></ref><ref id="scirp.103989-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, P. and Yu, B. (2006) On Model Selection Consistency of Lasso. The Journal of Machine Learning Research, 7, 2541-2563.</mixed-citation></ref><ref id="scirp.103989-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wainwright, M. (2009) Sharp Thresholds for High Dimensional and Noisy Sparsity Recovery Using 1-Constrained Quadratic Programming (Lasso). IEEE Transactions on Information Theory, 55, 2183-2202. https://doi.org/10.1109/TIT.2009.2016018</mixed-citation></ref><ref id="scirp.103989-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B, 67, 301-320.  
https://doi.org/10.1111/j.1467-9868.2005.00503.x</mixed-citation></ref><ref id="scirp.103989-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J. and Li, R. (2004) New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. Journal of the American Statistical Association, 99, 710-723. https://doi.org/10.1198/016214504000001060</mixed-citation></ref><ref id="scirp.103989-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J., Peng, H., et al. (2004) Nonconcave Penalized Likelihood with a Diverging Number of Parameters. The Annals of Statistics, 32, 928-961.  
https://doi.org/10.1214/009053604000000256</mixed-citation></ref><ref id="scirp.103989-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Xie, H. and Huang, J. (2009) SCAD-Penalized Regression in High-Dimensional Partially Linear Models. The Annals of Statistics, 37, 673-696.  
https://doi.org/10.1214/07-AOS580</mixed-citation></ref><ref id="scirp.103989-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. Second International Symposium on Information Theory, 267-281.</mixed-citation></ref><ref id="scirp.103989-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Schwarz, G. (1978) Estimating the Dimension of a Model. The Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136</mixed-citation></ref><ref id="scirp.103989-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Barron, A., Birge, L. and Massart, P. (1999) Risk Bounds for Model Selection via Penalization. Probability Theory and Related Fields, 113, 301-413.  
https://doi.org/10.1007/s004400050210</mixed-citation></ref><ref id="scirp.103989-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Baraud, Y. (2000) Model Selection for Regression on a Fixed Design. Probability Theory and Related Fields, 117, 467-493. https://doi.org/10.1007/PL00008731</mixed-citation></ref><ref id="scirp.103989-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Bogdan, M., Ghosh, J.K. and Doerge, R. (2004) Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 167, 989-999. https://doi.org/10.1534/genetics.103.021683</mixed-citation></ref><ref id="scirp.103989-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Chen, J. and Chen, Z. (2008) Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95, 759-771.  
https://doi.org/10.1093/biomet/asn034</mixed-citation></ref><ref id="scirp.103989-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Chen, J. and Chen, Z. (2012) Extended BIC for Small-n-Large-P Sparse GLM. Statistica Sinica, 22, 555-574. https://doi.org/10.5705/ss.2010.216</mixed-citation></ref><ref id="scirp.103989-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Luo, S., Xu, J. and Chen, Z. (2015) Extended Bayesian Information Criterion in the Cox Model with a High Dimensional Feature Space. Annals of the Institute of Statistical Mathematics, 67, 287-311. https://doi.org/10.1007/s10463-014-0448-y</mixed-citation></ref><ref id="scirp.103989-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Lu, W., Goldberg, Y. and Fine, J.P. (2012) On the Robustness of the Adaptive Lasso to Model Misspecifification. Biometrika, 99, 717-731.  
https://doi.org/10.1093/biomet/ass027</mixed-citation></ref><ref id="scirp.103989-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Fan, J. and Liao, Y. (2012) Endogeneity in High Dimensions. Annals of Stats, 42, 872-917. https://doi.org/10.1214/13-AOS1202</mixed-citation></ref></ref-list></back></article>