<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2019.77103</article-id><article-id pub-id-type="publisher-id">JAMP-93794</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Why Quantitative Variables Should Not Be Recoded as Categorical
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Antônio</surname><given-names>Fernandes</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Caio</surname><given-names>Malaquias</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Dalson</surname><given-names>Figueiredo</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Enivaldo</surname><given-names>da Rocha</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Rodrigo</surname><given-names>Lins</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Political Science, Federal University of Pernambuco (UFPE), Recife, Brazil</addr-line></aff><pub-date pub-type="epub"><day>10</day><month>07</month><year>2019</year></pub-date><volume>07</volume><issue>07</issue><fpage>1519</fpage><lpage>1530</lpage><history><date date-type="received"><day>13,</day>	<month>May</month>	<year>2019</year></date><date date-type="rev-recd"><day>19,</day>	<month>July</month>	<year>2019</year>	</date><date date-type="accepted"><day>22,</day>	<month>July</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  The transformation of quantitative variables into categories is a common practice in both experimental and observational studies. The typical procedure is to create groups by splitting the original variable distribution at some cut point on the scale of measurement (e.g. mean, median, mode). Allegedly, dichotomization improves causal inference by simplifying statistical analyses. In this article, we address some of the adverse consequences of recoding quantitative variables into categories. In particular, we provide evidence that categorization usually leads to inefficient and biased estimates. We believe that considerable progress in our understanding of data analysis can occur if scholars follow the recommendations presented in this article. The recodification of quantitative variables as categorical is a poor methodological strategy, and scientists must stay away from it.
 
</p></abstract><kwd-group><kwd>Dichotomization</kwd><kwd> Inefficiency</kwd><kwd> Bias</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Imagine a political scientist wants to estimate the effect of income, as measured by a continuous yearly revenue, on partisanship. Before performing data analyses, she decides to split income into three levels: low, medium, and high. Similarly, suppose a physicist wants to examine the effect of age on the likelihood of developing coronary heart diseases. Before running the model, she recodes age into four groups. In this article, we address some of the adverse consequences of dichotomizing quantitative variables. Technically, categorization always implies a loss of information, and it usually leads to misleading results [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref4">4</xref>] . To make our case, we reproduce data from [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] and [<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>] . Besides, we employ basic simulation to show how dichotomization generates inefficiency and bias. To increase transparency [<xref ref-type="bibr" rid="scirp.93794-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref9">9</xref>] , we report all computational scripts used to generate statistical analyses.</p><p>Our target audience is graduate students in the early stages of training and scholars with a minimum mathematical background. For this reason, we minimized algebraic applications to facilitate the understanding of the original content. In particular, the paper fills a gap in the political methodology literature. We reviewed 24 articles on dichotomization published in 20 journals from 1983 to 2017, and none of them was available in political science journals (see Appendix <xref ref-type="table" rid="table">Table </xref>A1). As long as the categorization of quantitative variables is a common practice not only in the Social Sciences but also in the Health Sciences [<xref ref-type="bibr" rid="scirp.93794-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref11">11</xref>] , we believe that considerable progress in our understanding of data analysis can occur if scholars follow the recommendations presented in this article.</p><p>The remainder of the paper is structured as follows. Following section reviews the literature on categorization. The second section replicates data from different studies to show how the transformation of quantitative variables into categories may lead to wrong conclusions. The third section uses basic simulation to highlight the shortcomings of dichotomization, focusing on both bias and efficiency. The final section concludes.</p></sec><sec id="s2"><title>2. What Is the Problem?</title><p>Information loss, Inefficiency, Bias, concisely, these are the main problems generated by the categorization of quantitative variables [<xref ref-type="bibr" rid="scirp.93794-ref12">12</xref>] . Despite its widespread use, the scholarly literature has accumulated systematic evidence on why scholars should avoid dichotomization. The discretization reduces measurement accuracy, underestimates the magnitude of the coefficients of bivariate relationships, and lowers statistical power [<xref ref-type="bibr" rid="scirp.93794-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref13">13</xref>] . Also, the artificial transformation of quantitative measures into groups may lead to biased coefficients and unreliable standard errors in multivariate models [<xref ref-type="bibr" rid="scirp.93794-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref14">14</xref>] .</p><p>Methodological pleas against dichotomization are not new. For example, [<xref ref-type="bibr" rid="scirp.93794-ref15">15</xref>] showed that dichotomizing one of the variables at it’s mean reduces the population correlation coefficient by 20% on average. [<xref ref-type="bibr" rid="scirp.93794-ref16">16</xref>] estimated the effects of dichotomization in the context of analysis of variance (ANOVA). Similarly, [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] argues that dichotomization leads to a loss of one-fifth to two-thirds of the variance that may be accounted for on the original variables. [<xref ref-type="bibr" rid="scirp.93794-ref17">17</xref>] showed that the transformation of quantitative measures into categories underestimates both effect sizes and statistical power. <xref ref-type="table" rid="table">Table </xref>1 summarizes scholarly work against dichotomization.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table">Table </xref>1</label><caption><title> Literature against dichotomizatio</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Author (year)</th><th align="center" valign="middle" >Warning</th></tr></thead><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref16">16</xref>]</td><td align="center" valign="middle" >“The use of the pseudo-orthogonal design biases the differences in means for the main effects relative to the differences in those means that would be obtained in a single-factor experiment” (p. 464).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>]</td><td align="center" valign="middle" >“Dichotomizing one variable at the mean results in the reduction in variance accounted for to 0.647 r<sup>2</sup>; and dichotomizing both at the mean, to 0.405 r<sup>2</sup>” (p. 249).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref18">18</xref>]</td><td align="center" valign="middle" >“Analyses with categorized continuous variables required greater than 40% more patients for the same power as that achieved using continuous variables” (p. 138).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>]</td><td align="center" valign="middle" >“Dichotomizing a continuous predictor variable can be conceptualized as adding an error of measurement to the variable. As a result, the effects of dichotomization are similar to the effects of random error of measurement” (p. 186).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref12">12</xref>]</td><td align="center" valign="middle" >“Dichotomization of continuous data is unnecessary for statistical analysis and in particular should not be applied to explanatory variables in regression models” (abstract).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref19">19</xref>]</td><td align="center" valign="middle" >“Dichotomizing a continuous variable is known to result in the loss of information, lower statistical power, and lower reliability” (abstract).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref11">11</xref>]</td><td align="center" valign="middle" >(Dichotomization) “(…) is harmful from the viewpoint of statistical estimation and hypothesis testing” (abstract).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref20">20</xref>]</td><td align="center" valign="middle" >“Modern regression models do not require categorization. In general, continuous variables should remain continuous in regression models designed to study the effects of the variable on the outcome of interest” (p. 3).</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref4">4</xref>]</td><td align="center" valign="middle" >“Undesirable effects occur from dichotomization of both independent and dependent variables. The problem gets worse when multiple independent variables are split; for example, residual confounding is introduced, and spurious interaction effects may be seen” (p. 225)</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>]</td><td align="center" valign="middle" >“Simply dichotomizing continuous variables without previously referring to the original distributions by plotting them and checking consequences of dichotomization is a bad idea and should be discouraged” (p. 78).</td></tr></tbody></table></table-wrap><p>Note: We reviewed 24 papers published in 20 journals from 1983 to 2017.</p><p>Another criticism against dichotomization comes from measurement literature [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] <sup>1</sup>. According to [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] , “dichotomizing adds errors of discreteness. That is, the amount of unmeasured true scores variance for the cases at each of the points of the dichotomy is necessarily greater than it would be for cases at each of the multiple points in the original scale” (p. 249). Simirlaly, [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] argue that the categorization of quantitative variables into groups is equivalent to add measurement error to the variable. Therefore, dichotomization increases the difference between true scores and measured values, which is likely to produce unreliable estimates. <xref ref-type="fig" rid="fig1">Figure 1</xref> shows the relationship between dichotomization and measurement error<sup>2</sup>.</p><p>B and C have similar scores when X is measured continuously. However, the dichotomization leads to an inefficient aggregation of A and B vis-a-vis C and D. Comparatively, the least useless procedure is to split a normal variable at its mean, which reduces the variance of the original variables by a 20% on average. However, it is doubtful to find perfect normal distributions in practice. Therefore, depending on the shape of the distribution, categorization will lead to more significant information loss [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref19">19</xref>] . In short, the categorization of quantitative variables will always generate information loss, which in turn will reduce estimates efficiency. In some cases, in addition to inefficiency, dichotomization can lead to biased estimates, as we will show in the next section.</p></sec><sec id="s3"><title>3. Replication</title><p>In this section, we replicate two secondary datasets to show some of the adverse consequences of dichotomizing quantitative variables. The first example comes from [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] . They created a hypothetical example to represent the relationship between</p><p>the number of errors made in a cognitive laboratory (X<sub>1</sub>), the speed of response during the task (X<sub>2</sub>), and the score on a standardized ability test (Y). <xref ref-type="fig" rid="fig2">Figure 2</xref> shows the Pearson correlation coefficient among those variables.</p><p>To explore the impact of categorization, [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] dichotomized both independent variables at their respective medians (13). Then, they estimate a 2 &#215; 2 ANOVA, which revealed an effect of X<sub>1</sub> and X<sub>2</sub> over the mean of Y. According to [<xref ref-type="bibr" rid="scirp.93794-ref5">5</xref>] , “the bivariate dichotomization of X<sub>1</sub>, and X<sub>2</sub> has led to a situation in which the estimated effects of X<sub>1</sub> and X<sub>2</sub> on Y are biased” (p. 183). A simple linear regression on the effect of X<sub>2</sub> on Y vanishes after we control for X<sub>1</sub>. In short, these results indicate that categorization may lead to misleading results.</p><p>The second example comes from [<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>] . He simulated five different scatterplots that yield an identical fourfold table when X and Y are dichotomized at cut point 0, misleadingly suggesting no association between the variables. <xref ref-type="fig" rid="fig3">Figure 3</xref> replicates data from [<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>] .</p><p>Dichotomization leads us to overlook the true nature of the relationship between X and Y. According to [<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>] , “simply dichotomizing continuous variables without previously referring to the original distributions by plotting them and checking consequences of dichotomization is a bad idea and should be discouraged” (p. 3). These two examples show how dichotomization can lead scholars to wrong inferences.</p></sec><sec id="s4"><title>4. Simulation</title><p>To stress our distrust on dichotomization, we employ basic simulation to show how the transformation of quantitative variables into categories produces inefficiency. First, we generate two normal variables (X and Y) correlated at.6 for a sample size of 300 cases. Then, we recode X at its mean (0) into two groups: below the average and above the average to produce a dummy variable (0 or 1). <xref ref-type="fig" rid="fig4">Figure 4</xref> shows the distribution of X and its dichotomization cutpoint at 0.</p><p><xref ref-type="fig" rid="fig5">Figure 5</xref> shows the correlation between X and Y and X categorized and Y for all cases (n = 300) and for a small sample of observations (n = 30).</p><p>The true correlation coefficient is 0.600. By dichotomizing X at its mean, we observe a linear association of 0.475, which represents a 20.83% difference from the known parameter. For a small sample size (n = 30), the Pearson correlation using the original variables is 0.465, which is closer to the true parameter value compared to the estimate from the dichotomized model. In short, regardless of the</p><p>sample size, dichotomization will lead to information loss, which decreases estimates efficiency. <xref ref-type="table" rid="table">Table </xref>2 shows the estimates of two linear regression models.</p><p>Considering all cases (n = 300), the standard error of the dichotomized model is twice as large compared to the model using the original variables. For a bivariate linear regression, the coefficient of determination is calculated by the square of Pearson correlation coefficient (0.6), which is 36%. In the dichotomized model, we observe an r<sup>2</sup> close to 23%, which underestimate the goodness of fit of the model. For n equals to 30, the categorization of the independent variable leads to the incorrect retention of the null hypothesis at 5% level (p-value = 0.052). Although our simulation deals with only two variables, the same reasoning applies to multiple linear regression, which is widely used in empirical research in both Human and Natural sciences [<xref ref-type="bibr" rid="scirp.93794-ref23">23</xref>] .</p><p>Now let’s consider a slightly more complicated case. We simulate the following model:</p><p>Y = 100 + 0.20 ∗ X 1 − 0.40 ∗ X 2 + ε (1)</p><p>where X<sub>1</sub> follows a normal distribution (0, 1), X<sub>2</sub> follows an exponential distribution (λ = 2) and ε has average value equals to zero and standard deviation equals to 1 for a population of 100 observations. <xref ref-type="table" rid="table">Table </xref>3 compares the results of a linear regression using original variables to a model when both independent variables are dichotomized at their means.</p><p>The dichotomized model displays a lower r<sup>2</sup> and F statistic, suggesting poor</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table">Table </xref>2</label><caption><title> How dichotomization leads to inefficiency</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  ></th><th align="center" valign="middle"  colspan="7"  >Sample size</th></tr></thead><tr><td align="center" valign="middle"  colspan="3"  >300</td><td align="center" valign="middle"  colspan="3"  >30</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Level of measurement of X</td><td align="center" valign="middle" >Βeta (Std. Error)</td><td align="center" valign="middle" >t</td><td align="center" valign="middle" >r<sup>2</sup></td><td align="center" valign="middle" >Βeta (Std. Error)</td><td align="center" valign="middle" >t</td><td align="center" valign="middle" >r<sup>2</sup></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Original</td><td align="center" valign="middle" >0.600 (0.046)</td><td align="center" valign="middle" >12.95</td><td align="center" valign="middle" >0.360</td><td align="center" valign="middle" >0.437 (0.157)</td><td align="center" valign="middle" >2.78</td><td align="center" valign="middle" >0.216</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Dichotomized</td><td align="center" valign="middle" >0.948 (0.102)</td><td align="center" valign="middle" >9.31</td><td align="center" valign="middle" >0.225</td><td align="center" valign="middle" >0.609 (0.300)</td><td align="center" valign="middle" >2.03</td><td align="center" valign="middle" >0.128</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><p>Note: we estimated two linear regression models. The first one was estimated with both variables at their original level of measurement (continuous). The second model used X dichotomized at its mean (0).</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table">Table </xref>3</label><caption><title> Linear regression (original x dichotomized variables)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Measurement</th><th align="center" valign="middle" >Model</th><th align="center" valign="middle" >β</th><th align="center" valign="middle" >Std. Error</th><th align="center" valign="middle" >p-value</th><th align="center" valign="middle" >Lower</th><th align="center" valign="middle" >Upper</th></tr></thead><tr><td align="center" valign="middle"  rowspan="3"  >Original</td><td align="center" valign="middle" >α</td><td align="center" valign="middle" >100.12</td><td align="center" valign="middle" >0.148</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >99.83</td><td align="center" valign="middle" >100.41</td></tr><tr><td align="center" valign="middle" >X<sub>1</sub></td><td align="center" valign="middle" >0.400</td><td align="center" valign="middle" >0.100</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.202</td><td align="center" valign="middle" >0.598</td></tr><tr><td align="center" valign="middle" >X<sub>2</sub></td><td align="center" valign="middle" >−0.527</td><td align="center" valign="middle" >0.191</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >−0.907</td><td align="center" valign="middle" >−0.147</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle"  colspan="6"  >F = 11.465; r<sup>2</sup> = 0.191</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Dichotomized</td><td align="center" valign="middle" >α</td><td align="center" valign="middle" >99.71</td><td align="center" valign="middle" >0.182</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >99.352</td><td align="center" valign="middle" >100.07</td></tr><tr><td align="center" valign="middle" >X<sub>1</sub></td><td align="center" valign="middle" >0.543</td><td align="center" valign="middle" >0.224</td><td align="center" valign="middle" >0.017</td><td align="center" valign="middle" >0.098</td><td align="center" valign="middle" >0.988</td></tr><tr><td align="center" valign="middle" >X<sub>2</sub></td><td align="center" valign="middle" >−0.230</td><td align="center" valign="middle" >0.233</td><td align="center" valign="middle" >0.325</td><td align="center" valign="middle" >−0.693</td><td align="center" valign="middle" >0.232</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle"  colspan="6"  >F = 3.924; r<sup>2</sup> = 0.075</td></tr></tbody></table></table-wrap><p>Source: authors.</p><p>goodness of fit. When variables are used at their original level of measurement, regression coefficients are unbiased estimates of the population parameters. However, when both variables are dichotomized at their means, X<sub>2</sub> is no longer statistically significant which will lead us to retain the null hypothesis of no effect incorrectly. For public policy, the conclusion would be to cut resources. In medical research, the inference would be that the treatment has no impact on health. <xref ref-type="fig" rid="fig6">Figure 6</xref> depicts the residual diagnostics from the dichotomized model.</p></sec><sec id="s5"><title>5. Conclusions</title><p>Despite criticisms from the scholarly community, dichotomization still is a common practice in empirical research. Unfortunately, many researchers categorize quantitative variables before running data analyses. This is true from Biology to Psychology, from Medical research to Sociology. Before statistical software and computers development, categorization played an essential role in science by simplifying mathematical modeling. It is not the case anymore. Since we have more appropriate tools to deal with reality, there is no reason to transform quantitative measures into categories. More than 30 years ago, [<xref ref-type="bibr" rid="scirp.93794-ref24">24</xref>] argued that “scientific questions are better decided by empirical evidence than by methodological default” (p. 833).</p><p>Categorization usually leads to misleading results. It can deceive us by increasing inefficiency and affecting the probability of type I and type II errors. Dichotomization also generates biased coefficients since it can hide the correct functional form of the observed relationship. In some cases, when two or more independent variables are dichotomized, a truly null effect will likely reach statistical significance. The artificial transformation of quantitative variables into groups reduces the power of statistical tests and increase errors of discreteness. What will happen if both independent and dependent variables are categorized? Double dichotomization using the mean as cutpoint is equivalent to lose almost 1/2 of the sample cases [<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>] . In short, dichotomization leads to a systematic loss of information which has detrimental effects on the reliability of statistical estimates.</p><p>In sum, the recodification of quantitative variables as categorical is a poor methodological strategy, and scholars must stay away from it. Dichotomization undoubtedly simplifies data analysis, but the costs are too higher to bear. Today, categorization is neither appropriate nor justifiable. Continuous variables are as good as they are. Let’s be cool about it and leave quantitative variables alone.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Fernandes, A., Malaquias, C., Figueiredo, D., da Rocha, E. and Lins, R. (2019) Why Quantitative Variables Should Not Be Recoded as Categorical. Journal of Applied Mathematics and Physics, 7, 1519-1530. https://doi.org/10.4236/jamp.2019.77103</p></sec><sec id="s8"><title>Appendix</title><table-wrap id="table4" ><label><xref ref-type="table" rid="table">Table </xref>A1</label><caption><title> Literature review per area</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Author (year)</th><th align="center" valign="middle" >Journal</th></tr></thead><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref1">1</xref>]</td><td align="center" valign="middle" >Applied Psychological Measurement</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref26">26</xref>]</td><td align="center" valign="middle" >Journal of Applied Psychology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref2">2</xref>]</td><td align="center" valign="middle" >British Journal of Cancer</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref18">18</xref>]</td><td align="center" valign="middle" >American Journal of Epidemiology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref27">27</xref>]</td><td align="center" valign="middle" >Epidemiology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref13">13</xref>]</td><td align="center" valign="middle" >Psychological Bulletin</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref28">28</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref29">29</xref>]</td><td align="center" valign="middle" >Journal of Educational and Behavioral Statistics</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref25">25</xref>]</td><td align="center" valign="middle" >Development and Psychopathology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref14">14</xref>]</td><td align="center" valign="middle" >Journal of Multivariate Analysis</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref21">21</xref>]</td><td align="center" valign="middle" >Psychological Methods</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref30">30</xref>]</td><td align="center" valign="middle" >Journal of Marketing Research</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref31">31</xref>]</td><td align="center" valign="middle" >Journal of the American Statistical Association</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref32">32</xref>]</td><td align="center" valign="middle" >British Medical Journal</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref19">19</xref>]</td><td align="center" valign="middle" >Statistics in Medicine</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref3">3</xref>]</td><td align="center" valign="middle" >Neuroepidemiology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.93794-ref11">11</xref>]</td><td align="center" valign="middle" >Pharmaceutical Statistics</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref20">20</xref>]</td><td align="center" valign="middle" >American Journal of Neuroradiology</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref4">4</xref>]</td><td align="center" valign="middle" >Medical Decision Making</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref6">6</xref>]</td><td align="center" valign="middle" >Teaching Statistics</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref33">33</xref>]</td><td align="center" valign="middle" >Quality Progress</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.93794-ref34">34</xref>]</td><td align="center" valign="middle" >Communications in Statistics-Theory and Methods</td></tr></tbody></table></table-wrap><p>Source: authors (2018).</p></sec><sec id="s9"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.93794-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Cohen, J. (1983) The Cost of Dichotomization. Applied Psychological Measurement, 7, 249-253. https://doi.org/10.1177/014662168300700301</mixed-citation></ref><ref id="scirp.93794-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Altman, D. (1991) Categorising Continuous Variables. British Journal of Cancer, 64, 975. https://doi.org/10.1136/bmj.332.7549.1080</mixed-citation></ref><ref id="scirp.93794-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Walraven, C. and Van and Hart, G. 2008) Leave Me Alone—Why Continuous Variables Should Be Analyzed as Such. Neuroepidemiology, 30, 138-139.  
https://doi.org/10.1159/000126908</mixed-citation></ref><ref id="scirp.93794-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Dawson, N.V. and Weiss, R. (2012) Dichotomizing Continuous Variables in Statistical Analysis. Medical Decision Making, 32, 225-226.  
https://doi.org/10.1177/0272989X12437605</mixed-citation></ref><ref id="scirp.93794-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Maxwell, S.E. and Delaney, H.D. (1993) Bivariate Median Splits and Spurius Statistical Significance. Psychological Bulletin, 113, 181-190.  
https://doi.org/10.1037//0033-2909.113.1.181</mixed-citation></ref><ref id="scirp.93794-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Kuss, O. (2013) The Danger of Dichotomizing Continuous Variables: A Visualization. Teaching Statistics, 35, 78-79. https://doi.org/10.1111/test.12006</mixed-citation></ref><ref id="scirp.93794-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Paranhos, R., Figueiredo Filho, D.B., da Rocha, E.C. and do Carmo, E.F. (2013) A importancia da replicabilidade na ciência política: O caso do SIGOBR. Revista Política Hoje, 22, 213-229.</mixed-citation></ref><ref id="scirp.93794-ref8"><label>8</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Janz</surname><given-names> N. </given-names></name>,<etal>et al</etal>. (<year>2016</year>)<article-title>Bringing the Gold Standard into the Classroom: Replication in University Teaching</article-title><source> International Studies Perspectives</source><volume> 17</volume>,<fpage> 392</fpage>-<lpage>407</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.93794-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Figueiredo, D., Lins, R., Domingos, A., Janz, N. and Silva, L. (2019) Seven Reasons Why: A User’s Guide to Reproducibility and Transparency. Brazilian Political Science Review, 13.</mixed-citation></ref><ref id="scirp.93794-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Lewis, J.A. (2004) In Defence of the Dichotomy. Pharmaceutical Statistics, 3, 77-79.  
https://doi.org/10.1002/pst.107</mixed-citation></ref><ref id="scirp.93794-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Fedorov, V., Mannino, F. and Zhang, R. (2009) Consequences of Dichotomization. Pharmaceutical Statistics, 8, 50-61. https://doi.org/10.1002/pst.331</mixed-citation></ref><ref id="scirp.93794-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Royston, P., Altman, D.G. and Sauerbrei, W. (2006) Dichotomizing Continuous Predictors in Multiple Regression: A Bad Idea. Statistics in Medicine, 25, 127-141.  
https://doi.org/10.1002/sim.2331</mixed-citation></ref><ref id="scirp.93794-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Delaney, H., Maxwell, S.E. and Delaney, H.D. (1993) Bivariate Median Splits and Spurious Statistical Significance. Psychological Bulletin, 113, 181-190.  
https://doi.org/10.1037//0033-2909.113.1.181</mixed-citation></ref><ref id="scirp.93794-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Taylor, J.M.G. and Yu, M. (2002) Bias and Efficiency Loss Due to Categorizing an Explanatory Variable. Journal of Multivariate Analysis, 83, 248-263.  
https://doi.org/10.1006/jmva.2001.2045</mixed-citation></ref><ref id="scirp.93794-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Peters, C.C. and Van Voorthis, W.R. (1940) Statistical Procedures and Their Mathematical Bases. McGraw-Hill, New York.</mixed-citation></ref><ref id="scirp.93794-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Humphreys, L.G. and Fleishman, A. (1974) Pseudo-Orthogonal and Other Analysis of Variance Designs Involving Individual-Differences Variables. Journal of Educational Psychology, 66, 464-472. https://doi.org/10.1037/h0036539</mixed-citation></ref><ref id="scirp.93794-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Cohen, J. and Cohen, P. (1983) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Earlbaum, Hillsdale.</mixed-citation></ref><ref id="scirp.93794-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, L.P. and Kolonel, L.N. (1992) Efficiency Loss from Categorizing Quantitative Exposures into Qualitative Exposures in Case-Control Studies. American Journal of Epidemiology, 136, 464-474. https://doi.org/10.1093/oxfordjournals.aje.a116520</mixed-citation></ref><ref id="scirp.93794-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Chen, H., Cohen, P. and Chen, S. (2007) Biased Odds Ratios from Dichotomization of Age. Statistics in Medicine, 26, 3487-3497. https://doi.org/10.1002/sim.2737</mixed-citation></ref><ref id="scirp.93794-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Naggara, O., et al. (2011) Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable: An Example from the Natural History of Unruptured Aneurysms. American Journal of Neuroradiology, 32, 437-440.  
https://doi.org/10.3174/ajnr.A2425</mixed-citation></ref><ref id="scirp.93794-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Maccallum, R.C., et al. (2002) On the Practice of Dichotomization of Quantitative Variables. Psychological Methods, 7, 19-40.  
https://doi.org/10.1037//1082-989X.7.1.19</mixed-citation></ref><ref id="scirp.93794-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Nunnally, J.C., Bernstein, I.H. and Berge, J.M.T. (1994) Psychometric Theory. Vol. 226, McGraw-Hill, New York.</mixed-citation></ref><ref id="scirp.93794-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Krueger, J. and Lewis-Beck, M. (2008) Is OLS Dead? The Political Methodologist, 15, 2-4.</mixed-citation></ref><ref id="scirp.93794-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Friedrich, R.J. (1982) In Defense of Multiplicative Terms in Multiple Regression Equations. American Journal of Political Science, 26, 797-833.  
https://doi.org/10.2307/2110973</mixed-citation></ref><ref id="scirp.93794-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Farrington, D.P. and Loeber, R. (2000) Some Benefits of Dichotomization in Psychiatric and Criminological Research. Criminal Behaviour and Mental Health, 10, 100-122. https://doi.org/10.1002/cbm.349</mixed-citation></ref><ref id="scirp.93794-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Schmidt, F. (2010) Detecting and Correcting the Lies That Data Tell. Perspectives on Psychological Science, 5, 233-242. https://doi.org/10.1177/1745691610369339</mixed-citation></ref><ref id="scirp.93794-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Ragland, D.R. (1992)) Dichotomizing Continuous Outcome Variables: Dependence of the Magnitude of Association and Statistical Power on the Cutpoint. Epidemiology, 3, 434-440. https://doi.org/10.1097/00001648-199209000-00009</mixed-citation></ref><ref id="scirp.93794-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Vargha, A., Rudas, T., Delaney, H.D. and Maxwell, S.E. (1996)) Dichotomization, Partial Correlation, and Conditional Independence. Journal of Educational and Behavioral Statistics, 21, 264-282. https://doi.org/10.3102/10769986021003264</mixed-citation></ref><ref id="scirp.93794-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Rousson, V. (2014) Measuring an Effect Size from Dichotomized Data: Contrasted Results Whether Using a Correlation or an Odds Ratio. Journal of Educational and Behavioral Statistics, 39, 144-163. https://doi.org/10.3102/1076998614524597</mixed-citation></ref><ref id="scirp.93794-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, J.R. and McClelland, G.H. (2003) Negative Consequences of Dichotomizing Continuous Predictor Variables. Journal of Marketing Research, 40, 366-371.  
https://doi.org/10.1509/jmkr.40.3.366.19237</mixed-citation></ref><ref id="scirp.93794-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Farewell, V.T., Tom, B.D.M. and Royston, P. (2004) The Impact of Dichotomization on the Efficiency of Testing for an Interaction Effect in Exponential Family Models. Journal of the American Statistical Association, 99, 822-831.  
https://doi.org/10.1198/016214504000001169</mixed-citation></ref><ref id="scirp.93794-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Altman, D.G. and Royston, P. (2006) The Cost of Dichotomising Continuous Variables. BMJ, 332, 1080. https://doi.org/10.1136/bmj.332.7549.1080</mixed-citation></ref><ref id="scirp.93794-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">Seaman, J.E. and Allen, I.E. (2014) Don’t Be Discrete. Quality Progress, 47, 41.</mixed-citation></ref><ref id="scirp.93794-ref34"><label>34</label><mixed-citation publication-type="other" xlink:type="simple">Nelson, S.P., Ramakrishnan, V., Nietert, P.J., Kamen, D.L., Ramos, P.S. and Wolf, B.J. (2017) An Evaluation of Common Methods for Dichotomization of Continuous Variables to Discriminate Disease Status. Communications in Statistics—Theory and Methods, 46, 10823-10834. https://doi.org/10.1080/03610926.2016.1248783</mixed-citation></ref></ref-list></back></article>