<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">Health</journal-id><journal-title-group><journal-title>Health</journal-title></journal-title-group><issn pub-type="epub">1949-4998</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/health.2014.621336</article-id><article-id pub-id-type="publisher-id">Health-52556</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Medicine&amp;Healthcare</subject></subj-group></article-categories><title-group><article-title>
 
 
  Simulation Program to Determine Sample Size and Power for a Multiple Logistic Regression Model with Unspecified Covariate Distributions
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>aoko</surname><given-names>Kumagai</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kohei</surname><given-names>Akazawa</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hiromi</surname><given-names>Kataoka</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yutaka</surname><given-names>Hatakeyama</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yoshiyasu</surname><given-names>Okuhara</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>Department of Medical Informatics, Niigata University Medical and Dental Hospital, Niigata, Japan</addr-line></aff><aff id="aff1"><addr-line>Integrated Center for Advanced Medical Technologies, Kochi Medical School, Kochi University, Kochi, Japan</addr-line></aff><aff id="aff3"><addr-line>Center of Medical Information Science, Kochi Medical School, Kochi University, Kochi, Japan</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>poosant@gmail.com(AK)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>23</day><month>12</month><year>2014</year></pub-date><volume>06</volume><issue>21</issue><fpage>2973</fpage><lpage>2998</lpage><history><date date-type="received"><day>14</day>	<month>October</month>	<year>2014</year></date><date date-type="rev-recd"><day>30</day>	<month>November</month>	<year>2014</year>	</date><date date-type="accepted"><day>13</day>	<month>December</month>	<year>2014</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.
 
</p></abstract><kwd-group><kwd>Logistic Regression Model</kwd><kwd> Monte Carlo Simulation</kwd><kwd> Non-Standard Distributions</kwd><kwd> Nonlinear</kwd><kwd> Power</kwd><kwd> Sample Size</kwd><kwd> Skewed Distribution</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Logistic regression models have been used to determine the association between risk factors and outcomes in various fields, including medical and epidemiological research [<xref ref-type="bibr" rid="scirp.52556-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.52556-ref2">2</xref>] . However, they sometimes produce contradictory conclusions for the same hypothesis. For example, some studies have indicated that cigarette smoking enhances the risk of Barrett’s Esophagus, whereas other studies have concluded that there is no association between the two because of a lack of power [<xref ref-type="bibr" rid="scirp.52556-ref3">3</xref>] . The robustness of such inferences is dependent on the relationship between sample size and power [<xref ref-type="bibr" rid="scirp.52556-ref4">4</xref>] . It is clearly important to calculate the sample size and estimate the power of observational studies, as well as randomized control studies, while accounting for the effects of other covariates.</p><p>Theoretical formulas, criteria, and software applications have been developed to enable the accurate determination of sample size and statistical power in a binary logistic regression model [<xref ref-type="bibr" rid="scirp.52556-ref5">5</xref>] - [<xref ref-type="bibr" rid="scirp.52556-ref11">11</xref>] . However, these tend to consider only specific, well-known probability distributions, even though it is clear that the power differs according to the shape of the covariate distribution. In practice, many covariates are inherently continuous, and their distributions take a variety of shapes (e.g., being heavily skewed to the left or right). Another problem is that the size of the effect can sometimes differ between outcomes and covariate. For example, J-shaped relationships are sometimes found in medical and epidemiological studies and an inverse relationship between diastolic pressure and adverse cardiac ischemic events (i.e., the lower the diastolic pressure the greater the risk of coronary heart disease and adverse outcomes) has been observed in numerous studies [<xref ref-type="bibr" rid="scirp.52556-ref12">12</xref>] . The distribution shape and effect size of covariates must be carefully considered. Therefore we have developed a software program that uses Monte Carlo simulations to estimate the exact power of a logistic regression model corresponding to the actual data structure. This program has numerous advantages. It can handle any distribution shape and effect size and enables the application of various statistical methods, logistic regression, and other simulations such as Bayesian inference with some modifications. In this paper, we report the application of our simulation program to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke in 292 Japanese men [<xref ref-type="bibr" rid="scirp.52556-ref13">13</xref>] .</p></sec><sec id="s2"><title>2. Theoretical Background</title><sec id="s2_1"><title>2.1. Standard Binary Linear Logistic Regression Model</title><p>We consider a case-control study in which the binary response variable y denotes each patient’s disease status (y = 1 for cases and y = 0 for controls). For each subject, we have a set of p covariates X<sub>1</sub>, X<sub>2</sub> , <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x6.png" xlink:type="simple"/></inline-formula>, X<sub>p</sub>. Let the conditional probability that an outcome is present be denoted by. The logit of the multiple logistic regression model is</p><disp-formula id="scirp.52556-formula236"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x7.png"  xlink:type="simple"/></disp-formula><p>in which case the logistic regression model is</p><disp-formula id="scirp.52556-formula237"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x8.png"  xlink:type="simple"/></disp-formula><p>where β is an unknown parameter.</p></sec><sec id="s2_2"><title>2.2. Two-Segment Logistic Regression Model for Nonlinear Association between a Logit Outcome and a Covariate</title><p>We replace the linear term associated with the covariate X<sub>1</sub> in the standard binary logistic regression model with a two-segment function containing a change-point. The relationship between the logit outcome and X<sub>1</sub> is different either side of this change-point. The two-segment logistic regression model shown in <xref ref-type="fig" rid="fig1">Figure 1</xref> can be expressed as follows</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Two-segment logistic regression model for non-linear association between logit outcomes and covariates</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/15-8203175x9.png"/></fig><disp-formula id="scirp.52556-formula238"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x10.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.52556-formula239"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x11.png"  xlink:type="simple"/></disp-formula><p>where represents the value of the change-point and; α<sub>1</sub>, α<sub>2</sub> are the unknown regression coefficients of X<sub>1</sub>.</p></sec></sec><sec id="s3"><title>3. Methods</title><sec id="s3_1"><title>3.1. Outline of Simulation Program</title><p>Our Monte Carlo simulation program is written in the SAS/STAT/IML language; the source code is given in the appendix. The program consists of three parts: data generation, parameter estimations, and statistical power calculation. Users should modify and add to these conditions according to their specific purposes and interests. <xref ref-type="table" rid="table1"><xref ref-type="table" rid="table">Table </xref>1</xref> describes the input parameters required to run the program. Users should assign suitable values, as determined by the relevant test problem. <xref ref-type="table" rid="table2"><xref ref-type="table" rid="table">Table </xref>2</xref> describes some macro modules for modifying this program.</p><p>Continuous distributions are generated by specifying the mean, standard deviation, skewness, kurtosis, and correlation, or by assigning frequencies in each designated interval of a continuous variable (see <xref ref-type="fig" rid="fig2">Figure 2</xref>). The nonlinear relationship between the continuous covariates and the logit outcome can be specified by varying the regression coefficients on either side of a change point, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>In the proposed program, users assign values for the proportion of events, sample size, Type I error, regression coefficients, distribution type (dichotomous, polytomous, or continuous), and distribution shape, as well as the quantile number for the categorization approach. The output of this program shows the average and standard error of each coefficient, as well as the power. A flowchart describing this program is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><p>The validity of the program is confirmed by comparing its results with those given by Hsieh’s program. The output of our program is almost the same as that from Hsieh’s program. For example, when the event proportion, sample size, and regression coefficient were set to 0.01, 12,580, and −0.223, respectively, our program estimated a power of 0.81, whereas Hsieh’s gave a result of 0.8. When the event proportion, sample size, and regression</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1"><xref ref-type="table" rid="table">Table </xref>1</xref></label><caption><title> Description of input parameters</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Input parameter</th><th align="center" valign="middle" >Explanation</th></tr></thead><tr><td align="center" valign="middle" >SEED</td><td align="center" valign="middle" >Random number seed (should be a positive integer)</td></tr><tr><td align="center" valign="middle" >ALEVEL</td><td align="center" valign="middle" >Significance level of the statistical test (Type I error)</td></tr><tr><td align="center" valign="middle" >P</td><td align="center" valign="middle" >Event proportion (response probability)</td></tr><tr><td align="center" valign="middle" >NITER</td><td align="center" valign="middle" >Number of iterations performed</td></tr><tr><td align="center" valign="middle" >N_REPEAT</td><td align="center" valign="middle" >Number of iterations performed</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ><sup>*</sup>NITER and N_REPEAT should be the same number</td></tr><tr><td align="center" valign="middle" >PATH</td><td align="center" valign="middle" >Directory in which results are saved</td></tr><tr><td align="center" valign="middle" >TABLE</td><td align="center" valign="middle" ><xref ref-type="table" rid="table">Table </xref>name for saved results</td></tr><tr><td align="center" valign="middle" >R</td><td align="center" valign="middle" >Number of categorized groups</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Example: continuous = 1, median = 2, tertile = 3, quantile = 4</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ><sup>*</sup>If model includes nominal variables, R should be &gt;1</td></tr><tr><td align="center" valign="middle" >CHANGE_POINT</td><td align="center" valign="middle" >Change point (see <xref ref-type="fig" rid="fig1">Figure 1</xref>)</td></tr><tr><td align="center" valign="middle"  colspan="2"  >Regression coefficients for the covariates in the full model, except for predictors and intercept, specified as:</td></tr><tr><td align="center" valign="middle" >MODEL_1</td><td align="center" valign="middle" >%NRSTR(α<sub>1</sub><sup>*</sup>X1+,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x12.png" xlink:type="simple"/></inline-formula> , + β<sub>i</sub>X<sub>i</sub>)</td></tr><tr><td align="center" valign="middle" >MODEL_2</td><td align="center" valign="middle" >%NRSTR(α<sub>1</sub><sup>*</sup>(the value of change_point) + α<sub>2</sub><sup>*</sup>(X1 − (the value of change_point) +,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x13.png" xlink:type="simple"/></inline-formula> , + β<sub>i</sub>X<sub>i</sub>)</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >α and β are the given regression coefficient values</td></tr><tr><td align="center" valign="middle"  colspan="2"  ><sup>*</sup>If model is linear, the regression coefficients α<sub>1</sub> and α<sub>2</sub> are the same</td></tr><tr><td align="center" valign="middle"  colspan="2"  >Sample size, mean, standard deviation, skewness, kurtosis, and correlation are specified as:</td></tr><tr><td align="center" valign="middle"  colspan="2"  >Example DATA a (type=CORR); LENGTH _TYPE_ $40; INPUT _NAME_ $_TYPE_$ X1　X2 ; IF TRIM(LEFT(_TYPE_))=’N’ THEN call symput(‘NSP’, X1); CARDS; . MEAN 70 50 . STD 4 5 . N 300 300 X1 CORR 1 0 X2 CORR 0 1 ; RUN;</td></tr><tr><td align="center" valign="middle"  colspan="2"  ><sup>*</sup>If only one covariate is defined, the correlation should be set to 1. The sample size of all covariates should be the same.</td></tr><tr><td align="center" valign="middle" >SKW_KRT</td><td align="center" valign="middle" >%NRSTR ({skewness 1 kurtosis 1, skewness 2 kurtosis 2, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x14.png" xlink:type="simple"/></inline-formula>})</td></tr><tr><td align="center" valign="middle"  colspan="2"  ><sup>*</sup>If covariates are normally distributed, both skewness and kurtosis are set to 0.</td></tr><tr><td align="center" valign="middle" >LIST_VARNAME</td><td align="center" valign="middle" >%NRSTR (X1, X2,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x15.png" xlink:type="simple"/></inline-formula> , Xi); list of variable names in A of above dataset</td></tr><tr><td align="center" valign="middle" >MIN</td><td align="center" valign="middle" >Minimum value of a continuous variable</td></tr><tr><td align="center" valign="middle" >MAX</td><td align="center" valign="middle" >Maximum value of a continuous variable</td></tr><tr><td align="center" valign="middle" >SUB_GROUP</td><td align="center" valign="middle" >Number of subgroups</td></tr><tr><td align="center" valign="middle" >CATEGORIZATION</td><td align="center" valign="middle" >%NRSTR (list of covariates to be categorized)</td></tr><tr><td align="center" valign="middle" >CATEGORIZATION_R</td><td align="center" valign="middle" >%NRSTR (list of new covariate names after categorization)</td></tr><tr><td align="center" valign="middle" >CONTI_MODEL</td><td align="center" valign="middle" >%NRSTR (list of covariates in a continuous logistic regression model)</td></tr><tr><td align="center" valign="middle"  colspan="2"  ><sup>*</sup>Even if some parameters are not needed, please assign all parameters and specify necessary variables in a logistic regression model.</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2"><xref ref-type="table" rid="table">Table </xref>2</xref></label><caption><title> Description of the macro module for modifications</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >To assign the desired number of observations to each subgroup, as shown <xref ref-type="fig" rid="fig1">Figure 1</xref>, part of the %Ratio module must be modified.</th></tr></thead><tr><td align="center" valign="middle" >For sample size, n<sub>k</sub>, observation values of the k<sup>th</sup> subgroup are extracted from a randomly generated U(a<sub>k</sub>, b<sub>k</sub>). U1 corresponds to the lowest interval subgroup, and U2 corresponds to the next lowest interval subgroup. For example, the minimum value, maximum value and number of subgroups are set to 1, 21 and 5, respectively. Therefore, the subgroups are (1, 5), (5, 9), (9, 13), (13, 17), (17, 21). The subgroups are assigned frequencies of 0.55, 0.05, 0.2, 0.15 and 0.05, respectively. &amp;NSP denotes the total sample size; ID is the observation identification.</td></tr><tr><td align="center" valign="middle" >%MACRO RATIO;</td></tr><tr><td align="center" valign="middle" >IF 1=&lt; ID &lt;&amp;NSP.*0.55 THEN _H1=U1;</td></tr><tr><td align="center" valign="middle" >ELSE IF &amp;NSP.*0.55 =</td></tr><tr><td align="center" valign="middle" >ELSE IF &amp;NSP.*0.6 =&lt; ID &lt;&amp;NSP.*0.8 THEN _H1=U3;</td></tr><tr><td align="center" valign="middle" >ELSE IF &amp;NSP.*0.8 =&lt; ID &lt;&amp;NSP.*0.95 THEN _H1=U4;</td></tr><tr><td align="center" valign="middle" >ELSE _H1=U5;</td></tr><tr><td align="center" valign="middle" >%MEND RATIO;</td></tr><tr><td align="center" valign="middle" >If model includes discrete variables, then specify the model in part of PROC LOGISTIC in %MODEL_CATEGORICAL, and ensure the input parameter R is greater than 1.</td></tr><tr><td align="center" valign="middle" >%MACRO MODEL_CATEGORICAL;</td></tr><tr><td align="center" valign="middle" >ODS OUTPUT PARAMETERESTIMATES=PARAM_&amp;R CONVERGENCESTATUS=STATUS_&amp;R TYPE3=TYPE3_&amp;R;</td></tr><tr><td align="center" valign="middle" >PROC LOGISTIC DATA=G&amp;R;</td></tr><tr><td align="center" valign="middle" >/*******modification**********************/</td></tr><tr><td align="center" valign="middle" >CLASS C1(PARAM=REF REF=&quot;0&quot;) D1(PARAM=REF REF=&quot;0&quot;) ;</td></tr><tr><td align="center" valign="middle" >MODEL Y(EVENT='1')= C1 X1 X2 X3 D1</td></tr><tr><td align="center" valign="middle" >/*********************************/</td></tr><tr><td align="center" valign="middle" >/TECH=NR MAXITER=8 XCONV=0.01;</td></tr><tr><td align="center" valign="middle" >BY STRATA; RUN;</td></tr><tr><td align="center" valign="middle" >%MEND;</td></tr></tbody></table></table-wrap><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Algorism of generating a continuous covariate which has a unique distribution</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/15-8203175x16.png"/></fig><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Flow chart of simulation program</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/15-8203175x17.png"/></fig><p>coefficient were set to 0.5, 225, and 0.405, respectively, our program estimated a power of 0.89, which compares well with Hsieh’s result of 0.9 [<xref ref-type="bibr" rid="scirp.52556-ref14">14</xref>] .</p></sec><sec id="s3_2"><title>3.2. Construction of Raw Simulation Data</title><sec id="s3_2_1"><title>3.2.1. Continuous Covariates</title><p>Non-normal or normal multivariate continuous variables are generated by specifying the mean, standard deviation, kurtosis, skewness, and correlation through a procedure in the %COEFF and %CONTINOUS SAS modules. A detailed explanation can be found in a book on SAS<sup>&#174;</sup> for Monte Carlo Studies [<xref ref-type="bibr" rid="scirp.52556-ref15">15</xref>] .</p></sec><sec id="s3_2_2"><title>3.2.2. Continuous Covariates That Are Uniquely Distributed (<xref ref-type="fig" rid="fig2">Figure 2</xref>)</title><p>A continuous variable is divided into l subgroups of equal intervals as shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. The minimum value of the original covariate is assumed to be Min and the maximum value is assumed to be Max.</p><p>The length of the interval of each subgroup is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x18.png" xlink:type="simple"/></inline-formula>.</p><p>The k<sup>th</sup> subgroup ranges from a<sub>k</sub><sub> </sub>to b<sub>k</sub> (k = 1, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x19.png" xlink:type="simple"/></inline-formula>, l), where k = 1 indicates the lowest subgroup and l indicates the highest. a<sub>k</sub> and b<sub>k</sub> can be expressed as</p><disp-formula id="scirp.52556-formula240"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x20.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.52556-formula241"><graphic  xlink:href="http://html.scirp.org/file/15-8203175x21.png"  xlink:type="simple"/></disp-formula><p>Random numbers from a uniform distribution on the interval (0, 1) are converted to a uniform distribution on the interval (a<sub>k</sub>, b<sub>k</sub>) with the equation a<sub>k</sub> + (b<sub>k</sub> ? a<sub>k</sub>) &#215; (generated number). The k<sup>th</sup> subgroup, consisting of n<sub>k</sub> observations in the interval (a<sub>k</sub>, b<sub>k</sub>), is denoted by the variable H<sub>1</sub>.</p></sec><sec id="s3_2_3"><title>3.2.3. Statistical Probability Distribution</title><p>If the covariate is assumed to follow a probability distribution, the RAND function can be inserted into a macro PDF module. In the example given for this program, nominal variables are generated using the SAS TABLE function.</p></sec><sec id="s3_2_4"><title>3.2.4. Determination of Binary Outcome</title><p>The individual probability of event occurrences is calculated from the assigned parameters and generated covariates using a logistic regression model. The initial intercept value is set to zero, and then the average is calculated. The intercept is determined from <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x22.png" xlink:type="simple"/></inline-formula> and p by the following equation:</p><p>Intercept = <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x23.png" xlink:type="simple"/></inline-formula></p><p>After determining the intercept, the individual probability π(X<sub>i</sub>) (for i = 1, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/15-8203175x24.png" xlink:type="simple"/></inline-formula>, n observations) is calculated by a logistic regression model. The binary outcome Y is generated from the individual π(x<sub>i</sub>) and random numbers from a uniform distribution on the interval (0, 1). If π(x<sub>i</sub>) is less than the corresponding random number, Y<sub>i</sub> = 1 (denoting that the event occurred); otherwise, Y<sub>i</sub> = 0. Finally, we have a dataset consisting of a covariate and response variable (Y). For skewed distributions, the event proportion of the generated dataset might not be the same as the input value. However, our program outputs the event proportion of this dataset. This difference can be adjusted by changing the input parameters of the event proportion.</p></sec><sec id="s3_2_5"><title>3.2.5. Estimation of Regression Coefficients and Standard Errors</title><p>We conducted a logistic regression analysis in a model including continuous and/or design variables to obtain maximum likelihood estimates of and the significance level, or p-value, for the null hypothesis with population regression coefficient β = 0.</p><p>There is a possibility of non-convergence if the data are completely or partially separated. This is because one or more parameters in the model become theoretically infinite, and it may not be possible to obtain reliable maximum likelihood estimates [<xref ref-type="bibr" rid="scirp.52556-ref16">16</xref>] . These instances of non-convergence must be appropriately handled. Our simulation program overcomes this problem by neglecting samples that lead to non-convergence.</p></sec><sec id="s3_2_6"><title>3.2.6. Test Module</title><p>The test module outputs the mean of the asymptotic standard error, and the statistical power. The proportion of tests in which the p-value is less than the Type I error level is defined as the power.</p></sec></sec></sec><sec id="s4"><title>4. Sample Runs</title><sec id="s4_1"><title>4.1. Sample Run 1</title><p>Variable H<sub>1</sub> was set to be the National Institute of Health Stroke Scale (NIHSS, a tool used by healthcare providers to objectively quantify the impairment caused by a stroke). The minimum, maximum, and the number of subgroups were set as 1, 21, and 5, respectively. Therefore, the subgroups ranged from (1, 5), (5, 9), (9, 13), (13, 17), and (17, 21). The frequency of each subgroup was assumed to be 0.55, 0.05, 0.2, 0.15, and 0.05, respectively. The generated numbers were rounded off, and the event proportion was set to 0.2. Regression coefficient parameters (,) were taken as (0.00, 0.00), (0.06, 0.06), and (0.06, 0.15), and the change point was set to 4. H<sub>1</sub> was set to be either continuous, divided at median or tertile points, or categorized into three groups: 1 - 4, 5 - 15, and ≥16. We executed the logistic model for these values of H<sub>1</sub>, and present the results in <xref ref-type="table" rid="table">Table </xref>3. When and were set to 0.06 and 0.06, the average coefficient value was correctly estimated to be 0.062. When these parameters were set to 0.06 and 0.15, the categorization using the change point produced higher coefficient values than that using the tertile points. Moreover, when and were set to 0.0 and 0.0, the power was approximately 0.05, the same as the Type I error.</p></sec><sec id="s4_2"><title>4.2. Sample Run 2</title><p>We used age and systolic arterial pressure as continuous variables X<sub>1 </sub>and X<sub>2</sub>, respectively. The mean and standard deviation of X<sub>1</sub> were 70 and 8, and the skewness and kurtosis were set to combinations of 0 and 0, −0.5 and 0.5, and −1.0 and 1.0. The regression coefficient of X<sub>1</sub> was 0.05 under a linear relationship. The mean and standard deviation of X<sub>2</sub> were 160 and 25, and the skewness and kurtosis were set to combinations of 0 and 0, 0.4 and 0.3, and 0.8 and 0.6. The regression coefficient of X<sub>2</sub> was set to 0.02 as a linear relationship. The correlation between the variables was set to 0, 0.3, and 0.6. The binary variable D<sub>1</sub> denotes smoking or non-smoking. The proportion of non-smokers and smokers was 0.5 and 0.5, and the regression coefficient was 0.83. The sample size was set to 300 and 500. We executed the logistic model for X<sub>1</sub>, X<sub>2</sub>, and D<sub>1</sub>. The results are shown in <xref ref-type="table" rid="table">Table </xref>4.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table">Table </xref>3</label><caption><title> Sample run 1: estimated power of the Wald test in two-segment logistic regression model with an event proportion of 0.2</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >α<sub>1</sub></th><th align="center" valign="middle" >α<sub>2</sub></th><th align="center" valign="middle" >Categorization</th><th align="center" valign="middle" >Coefficient</th><th align="center" valign="middle" >SE</th><th align="center" valign="middle" >Power</th></tr></thead><tr><td align="center" valign="middle" >0.00</td><td align="center" valign="middle" >0.00</td><td align="center" valign="middle" >Continuous</td><td align="center" valign="middle" >0.001</td><td align="center" valign="middle" >0.027</td><td align="center" valign="middle" >0.061</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Median</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.294</td><td align="center" valign="middle" >0.055</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Tertile</td><td align="center" valign="middle" >0.013</td><td align="center" valign="middle" >0.367</td><td align="center" valign="middle" >0.049</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1 - 4</td><td align="center" valign="middle" >0.013</td><td align="center" valign="middle" >0.368</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >5 - 15</td><td align="center" valign="middle" >−0.001</td><td align="center" valign="middle" >0.314</td><td align="center" valign="middle" >0.043</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >≥16</td><td align="center" valign="middle" >−0.033</td><td align="center" valign="middle" >0.546</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >0.06</td><td align="center" valign="middle" >0.06</td><td align="center" valign="middle" >Continuous</td><td align="center" valign="middle" >0.062</td><td align="center" valign="middle" >0.026</td><td align="center" valign="middle" >0.682</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Median</td><td align="center" valign="middle" >0.616</td><td align="center" valign="middle" >0.293</td><td align="center" valign="middle" >0.545</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Tertile</td><td align="center" valign="middle" >0.238</td><td align="center" valign="middle" >0.397</td><td align="center" valign="middle" >0.459</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1 - 4</td><td align="center" valign="middle" >0.768</td><td align="center" valign="middle" >0.374</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >5 - 15</td><td align="center" valign="middle" >0.530</td><td align="center" valign="middle" >0.313</td><td align="center" valign="middle" >0.516</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >≥16</td><td align="center" valign="middle" >0.904</td><td align="center" valign="middle" >0.477</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >0.06</td><td align="center" valign="middle" >0.15</td><td align="center" valign="middle" >Continuous</td><td align="center" valign="middle" >0.153</td><td align="center" valign="middle" >0.027</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Median</td><td align="center" valign="middle" >1.527</td><td align="center" valign="middle" >0.309</td><td align="center" valign="middle" >0.999</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Quantile</td><td align="center" valign="middle" >0.588</td><td align="center" valign="middle" >0.456</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1 - 4</td><td align="center" valign="middle" >1.896</td><td align="center" valign="middle" >0.417</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >5 - 15</td><td align="center" valign="middle" >1.323</td><td align="center" valign="middle" >0.326</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >≥16</td><td align="center" valign="middle" >2.287</td><td align="center" valign="middle" >0.468</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table">Table </xref>4</label><caption><title> (a) Sample run 2: estimated power of the Wald test for left- and right-skewed distributions with an event proportion of 0.2 and N = 300; (b) Sample run 2: estimated power of the Wald test for left- and right-skewed distributions with an event proportion of 0.2 and N = 500</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >X<sub>1</sub> X<sub>2</sub></th><th align="center" valign="middle" ></th><th align="center" valign="middle" >X<sub>1</sub></th><th align="center" valign="middle" >Left-skewed covariate</th><th align="center" valign="middle" ></th><th align="center" valign="middle" >X<sub>2</sub></th><th align="center" valign="middle" >Right- skewed covariate</th><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle" >D<sub>1 </sub>binary</th><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" >(skewness, kurtosis)</td><td align="center" valign="middle" >Mean</td><td align="center" valign="middle" >SD</td><td align="center" valign="middle" >(S, K) Coefficient</td><td align="center" valign="middle" >SE</td><td align="center" valign="middle" >Power</td><td align="center" valign="middle" >Mean</td><td align="center" valign="middle" >SD</td><td align="center" valign="middle" >(S, K) Coefficient</td><td align="center" valign="middle" >SE</td><td align="center" valign="middle" >Power</td><td align="center" valign="middle" >Coefficient SE</td><td align="center" valign="middle" >SE</td><td align="center" valign="middle" >Power</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >70</td><td align="center" valign="middle" >8</td><td align="center" valign="middle" >β<sub>1</sub> = 0.05</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >160</td><td align="center" valign="middle" >25</td><td align="center" valign="middle" >β<sub>2</sub> = 0.02</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >β<sub>2</sub> = 0.83</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle"  colspan="2"  >Correlation of (X<sub>1</sub>, X<sub>2</sub>) = 0.0</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.019</td><td align="center" valign="middle" >0.774</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.0, 0.0) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.931</td><td align="center" valign="middle" >0.860</td><td align="center" valign="middle" >0.304</td><td align="center" valign="middle" >0.801</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.754</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.0, 0.0) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.933</td><td align="center" valign="middle" >0.864</td><td align="center" valign="middle" >0.304</td><td align="center" valign="middle" >0.815</td></tr><tr><td align="center" valign="middle" >(−1.0, 1.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.053</td><td align="center" valign="middle" >0.021</td><td align="center" valign="middle" >0.727</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.0, 0.0) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.933</td><td align="center" valign="middle" >0.864</td><td align="center" valign="middle" >0.304</td><td align="center" valign="middle" >0.818</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.019</td><td align="center" valign="middle" >0.764</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.948</td><td align="center" valign="middle" >0.861</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.805</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.019</td><td align="center" valign="middle" >0.766</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.957</td><td align="center" valign="middle" >0.861</td><td align="center" valign="middle" >0.306</td><td align="center" valign="middle" >0.805</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.753</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.948</td><td align="center" valign="middle" >0.863</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.812</td></tr><tr><td align="center" valign="middle" >(−1.0, 1.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.053</td><td align="center" valign="middle" >0.021</td><td align="center" valign="middle" >0.723</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >24.9</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.861</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.816</td></tr><tr><td align="center" valign="middle"  colspan="2"  >Correlation of (X<sub>1</sub>, X<sub>2</sub>) = 0.3</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.051</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.742</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.021</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.910</td><td align="center" valign="middle" >0.871</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.820</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.021</td><td align="center" valign="middle" >0.704</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.021</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.910</td><td align="center" valign="middle" >0.871</td><td align="center" valign="middle" >0.304</td><td align="center" valign="middle" >0.830</td></tr><tr><td align="center" valign="middle" >(−1.0, 1.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.052</td><td align="center" valign="middle" >0.022</td><td align="center" valign="middle" >0.665</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.021</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.906</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.306</td><td align="center" valign="middle" >0.826</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.747</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.922</td><td align="center" valign="middle" >0.873</td><td align="center" valign="middle" >0.303</td><td align="center" valign="middle" >0.827</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.755</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.874</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.823</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.021</td><td align="center" valign="middle" >0.704</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.926</td><td align="center" valign="middle" >0.874</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.829</td></tr><tr><td align="center" valign="middle" >(−1.0, 1.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.052</td><td align="center" valign="middle" >0.023</td><td align="center" valign="middle" >0.662</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.006</td><td align="center" valign="middle" >0.938</td><td align="center" valign="middle" >0.874</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.829</td></tr><tr><td align="center" valign="middle"  colspan="2"  >Correlation of (X<sub>1</sub>, X<sub>2</sub>) = 0.6</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.024</td><td align="center" valign="middle" >0.587</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.021</td><td align="center" valign="middle" >0.008</td><td align="center" valign="middle" >0.779</td><td align="center" valign="middle" >0.873</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.817</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.025</td><td align="center" valign="middle" >0.542</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.020</td><td align="center" valign="middle" >0.008</td><td align="center" valign="middle" >0.780</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.304</td><td align="center" valign="middle" >0.823</td></tr><tr><td align="center" valign="middle" >(−0.1, 1.0) (0.0, 0.0)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.052</td><td align="center" valign="middle" >0.027</td><td align="center" valign="middle" >0.508</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.021</td><td align="center" valign="middle" >0.008</td><td align="center" valign="middle" >0.805</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.303</td><td align="center" valign="middle" >0.819</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.024</td><td align="center" valign="middle" >0.595</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.812</td><td align="center" valign="middle" >0.873</td><td align="center" valign="middle" >0.307</td><td align="center" valign="middle" >0.822</td></tr><tr><td align="center" valign="middle" >(0.0, 0.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(0.0, 0.0) 0.052</td><td align="center" valign="middle" >0.024</td><td align="center" valign="middle" >0.608</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.844</td><td align="center" valign="middle" >0.873</td><td align="center" valign="middle" >0.308</td><td align="center" valign="middle" >0.815</td></tr><tr><td align="center" valign="middle" >(−0.5, 0.5) (0.4, 0.3)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−0.5, 0.5) 0.052</td><td align="center" valign="middle" >0.025</td><td align="center" valign="middle" >0.557</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.4, 0.3) 0.020</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.815</td><td align="center" valign="middle" >0.872</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.818</td></tr><tr><td align="center" valign="middle" >(−1.0, 1.0) (0.8, 0.6)</td><td align="center" valign="middle" >70.0</td><td align="center" valign="middle" >8.0</td><td align="center" valign="middle" >(−1.0, 1.0) 0.052</td><td align="center" valign="middle" >0.026</td><td align="center" valign="middle" >0.530</td><td align="center" valign="middle" >160.0</td><td align="center" valign="middle" >25.0</td><td align="center" valign="middle" >(0.8, 0.6) 0.020</td><td align="center" valign="middle" >0.007</td><td align="center" valign="middle" >0.878</td><td align="center" valign="middle" >0.869</td><td align="center" valign="middle" >0.305</td><td align="center" valign="middle" >0.812</td></tr></tbody></table></table-wrap><p>S, skewness; K, kurtosis.</p><p>(b)</p><p>S, skewness; K, kurtosis.</p><p>The mean, standard deviation, skewness, and kurtosis in the generated variables were almost equal to those of their input parameters. An increase in sample size leads to higher power, whereas a higher correlation produces a lower power. Our results clearly illustrate that the power will differ depending on the shape of the distribution. Negatively skewed distributions exhibit low power, whereas a positive skew results in high power. There is an inverse relationship between the logit outcome and the covariates. For example, when the skewness of X<sub>1</sub> was changed from −0.5 to −1.0, the power decreased from 0.75 to 0.73. When the skewness of X<sub>2</sub> was changed from 0.4 to 0.8, the power increased from 0.81 to 0.84 or 0.88 for a sample size of 300 and correlation of 0.0.</p></sec><sec id="s4_3"><title>4.3. Sample Run 3: Epidemiological Studies</title><p>It is important to establish that the results observed in the above simulations hold for real data. For this purpose, we used data from a study of the influence of smoking on 90-day outcomes after acute atherothrombotic stroke in 292 Japanese men [<xref ref-type="bibr" rid="scirp.52556-ref14">14</xref>] . In this study, body temperature, age, NIHSS score at admission, systolic blood pressure, and smoking status were included in the logistic model. Detailed input parameter information is given in <xref ref-type="table" rid="table">Table </xref>5(a), and the estimated results are listed in <xref ref-type="table" rid="table">Table </xref>5(b). The event proportion of this real study was 0.2. We obtained an event proportion of 0.206 in the generated dataset by setting an input value of 0.15 for the event proportion. The estimated coefficients were similar to the results of the epidemiological study. Real data analysis showed that all factors, i.e. body temperature, age, NIHSS score at admission, systolic blood pressure, and smoking status, were significantly associated with the outcome (p &lt; 0.05), and our results also exhibited high power (minimum to maximum of 0.686 to 1.000).</p><table-wrap-group id="5"><label><xref ref-type="table" rid="table">Table </xref>5</label><caption><title> (a) Assigned input parameters for sample run 3; (b) Sample run 3: estimated power of the Wald test for an epidemiological study with a sample size of 292</title></caption><table-wrap id="5_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Input parameter</th><th align="center" valign="middle" >Explanation</th></tr></thead><tr><td align="center" valign="middle" >SEED</td><td align="center" valign="middle" >9</td></tr><tr><td align="center" valign="middle" >ALEVEL</td><td align="center" valign="middle" >0.05</td></tr><tr><td align="center" valign="middle" >P</td><td align="center" valign="middle" >0.15</td></tr><tr><td align="center" valign="middle" >NITER</td><td align="center" valign="middle" >1000</td></tr><tr><td align="center" valign="middle" >PATH</td><td align="center" valign="middle" >C:\</td></tr><tr><td align="center" valign="middle" >TABLE</td><td align="center" valign="middle" >Table_samplerun_3</td></tr><tr><td align="center" valign="middle" >R</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >CHANGE_POINT</td><td align="center" valign="middle" >4</td></tr><tr><td align="center" valign="middle" >MODEL_1</td><td align="center" valign="middle" >%NRSTR(0.04*H1+0.8*D1+ 0.06*X1 + 0.02*X2 + 1.1*X3)</td></tr><tr><td align="center" valign="middle" >MODEL_2</td><td align="center" valign="middle" >%NRSTR(0.04*4+0.15*(H1-4)+0.8*D1+0.06*X1+0.02*X2 +1.1*X3)</td></tr><tr><td align="center" valign="middle"  colspan="2"  >DATA a (type=CORR); LENGTH _TYPE_ $40; INPUT _NAME_ $_TYPE_$ X1 X2 X3 ; IF TRIM(LEFT(_TYPE_))=’N’ THEN call symput(‘NSP’,X1); CARDS; . MEAN 70 160 36.4 . STD 8 25 0.5 . N 292 292 292 X1 CORR 1 -0.1 -0.1 X2 CORR -0.1 1 0.1 X3 CORR -0.1 0.1 1　 ; RUN；</td></tr><tr><td align="center" valign="middle" >SKW_KRT</td><td align="center" valign="middle" >%NRSTR({-0.5 0.5, 0.4 0.3, -0.08 0.7})</td></tr><tr><td align="center" valign="middle" >LIST_VARNAME</td><td align="center" valign="middle" >%NRSTR(X1 X2 X3)</td></tr><tr><td align="center" valign="middle" >CONTI_MODEL</td><td align="center" valign="middle" >%NRSTR(X1 X2 X3 C1)</td></tr><tr><td align="center" valign="middle" >Min=</td><td align="center" valign="middle" >1</td></tr><tr><td align="center" valign="middle" >Max=</td><td align="center" valign="middle" >21</td></tr><tr><td align="center" valign="middle" >SUB_GROUP=</td><td align="center" valign="middle" >5</td></tr><tr><td align="center" valign="middle" >CATEGORIZATION</td><td align="center" valign="middle" >%NRSTR(H1)</td></tr><tr><td align="center" valign="middle" >CATEGORIZATION_</td><td align="center" valign="middle" >%NRSTR(H1_R)</td></tr></tbody></table></table-wrap><table-wrap id="5_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Risk factor</th><th align="center" valign="middle"  colspan="2"  >Results of epidemiological study</th><th align="center" valign="middle"  colspan="2"  >Results of simulation</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Coefficient</td><td align="center" valign="middle" >p value</td><td align="center" valign="middle" >Coefficient</td><td align="center" valign="middle" >Power</td></tr><tr><td align="center" valign="middle" >Smoker</td><td align="center" valign="middle" >0.82</td><td align="center" valign="middle" >0.019</td><td align="center" valign="middle" >0.83</td><td align="center" valign="middle" >0.686</td></tr><tr><td align="center" valign="middle" >Age (years)</td><td align="center" valign="middle" >0.06</td><td align="center" valign="middle" >0.014</td><td align="center" valign="middle" >0.06</td><td align="center" valign="middle" >0.792</td></tr><tr><td align="center" valign="middle" >Systolic arterial pressure</td><td align="center" valign="middle" >0.02</td><td align="center" valign="middle" >0.0096</td><td align="center" valign="middle" >0.02</td><td align="center" valign="middle" >0.847</td></tr><tr><td align="center" valign="middle" >Body temperature</td><td align="center" valign="middle" >1.18</td><td align="center" valign="middle" >0.0013</td><td align="center" valign="middle" >1.14</td><td align="center" valign="middle" >0.893</td></tr><tr><td align="center" valign="middle" >NIHSS score at admission</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >5 - 15</td><td align="center" valign="middle" >1.40</td><td align="center" valign="middle" >0.001</td><td align="center" valign="middle" >1.33</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" >≥16</td><td align="center" valign="middle" >2.25</td><td align="center" valign="middle" >0.001</td><td align="center" valign="middle" >2.30</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap></table-wrap-group><p>Event proportion = 0.206.</p></sec></sec><sec id="s5"><title>5. Conclusions and Discussion</title><p>Estimating the sample size or inference of statistical power is critical. If the sample size is too low, the experiment will lack the precision needed to provide reliable answers to the questions it is investigating. If the sample size is too large, time and resources will be wasted, often for minimal gain [<xref ref-type="bibr" rid="scirp.52556-ref17">17</xref>] . In this study, we developed a Monte-Carlo simulation program that estimates the powers of covariates in the binary logistic regression model. Users can evaluate the relationship between sample size and covariates, in observational and power randomized studies. In this situation, our simulation results clearly indicated the relationship between statistical power and covariate distribution shape, as shown by the data in <xref ref-type="table" rid="table">Table </xref>4. Right- and left-skewed distributions exhibit different powers. This phenomenon has clarified that the shape of a distribution affects its statistical power [<xref ref-type="bibr" rid="scirp.52556-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.52556-ref19">19</xref>] . The advantage of using a theoretical equation to estimate the power is that it is quick and easy to implement using existing software. For this reason, power equations are used to inform most studies. However, in practical analysis, we must often compute the power with a relatively complex distribution.</p><p>Our program is flexible enough to accommodate any number or type (continuous or discrete) of covariate and categorization, continuous distribution shapes and correlations, and the association level between logit outcome and covariates, although some modifications may be necessary. This program can also be applied to other statistical methods, logistic regression and Bayesian inference. The SAS/STAT/IML program written for the simulations and a user manual are available upon request [<xref ref-type="bibr" rid="scirp.52556-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.52556-ref21">21</xref>] .</p></sec><sec id="s6"><title>Acknowledgements</title><p>The authors wish to thank Dr. Motonori Hatta for helpful advice on the study.</p></sec><sec id="s7"><title>Conflict of Interest</title><p>The authors have no conflicts of interest to declare.</p></sec><sec id="s8"><title>Contribution</title><p>K.A. and N.K. were responsible for the study conception. N.K. wrote the program and drafted this manuscript. N.K., K.A., Y.O., H.K., and Y.H. made revisions to the manuscript. All authors approved the final version of the manuscript.</p></sec><sec id="s9"><title>Appendix: Simulation Program for Estimating the Statistical Power of Logistic Regression Model</title><p>%LET SEED=;</p><p>%LET ALEVEL=;</p><p>%LET PATH=;</p><p>%LET TABLE=;</p><p>%LET NITER=;</p><p>%LET SKW_KRT=%NRSTR({});</p><p>%LET LIST_VARNAME=%NRSTR();</p><p>%LET MODEL_1=%NRSTR();</p><p>%LET MODEL_2=%NRSTR();</p><p>%LET CATEGORIZATION=%NRSTR();</p><p>%LET CATEGORIZATION_R=%NRSTR();</p><p>%LET CONTI_MODEL=%NRSTR();</p><p>%LET CHANGE_POINT=;</p><p>%LET MAX=;</p><p>%LET MIN=;</p><p>%LET SUB_GROUP=;</p><p>%LET P=;</p><p>/*EXAMPLE*/</p><p>DATA A (TYPE=CORR);</p><p>LENGTH _TYPE_ $40;</p><p>INPUT _NAME_ $ _TYPE_$ X1 X2 ;</p><p>IF TRIM(LEFT(_TYPE_))='N'THENCALL SYMPUT('NSP', X1);</p><p>CARDS;</p><p>. MEAN 70 160</p><p>. STD 8 25</p><p>. N 300 300</p><p>X1 CORR 1 0</p><p>X2 CORR 0 1</p><p>;</p><p>RUN;</p><p>%DATASET(N_REPEAT=);</p><p>%LR(R=);</p><p>/************************%COEFF and %CONTINUOUS **************/</p><p>/*%COEFF and %CONTINUOUS generate random variables following a Multivariate /*Normal distribution with given means, standard deviations, and correlation matrix, /*and then transform each variable to the desired distributional shape with specified /*population univariate skewness and kurtosis</p><p>/*%COEFF</p><p>/*Macro COEFF calculates coefficients of the Fleishman’s power transformation</p><p>/*Equation X= A + B*C1 + C*C2^2 +D*C3^3 where A=-C</p><p>/*Parameters</p><p>/*SKW_KRT; %NRSTR({skewness1 kurtosis1, skewness2 kurtosis2,…, });</p><p>/*LIST_VARNAME; list of variable names that define the skewness and kurtosis.</p><p>/*OUT the name of the output file (name of COEFF) that has thecoefficient values (A B C) of each variable.</p><p>/********************************************************************/</p><p>%MACROCOEFF;</p><p>PROC IML;</p><p>/* COEFFICIENTS OF B, C, D FOR FLEISHMAN'S POWER TRANSFORMATION*/</p><p>SKEWKURT=&amp;SKW_KRT;</p><p>MAXITER=25;</p><p>CONVERGE=.000001;</p><p>START FUN;</p><p>C1=COEF [<xref ref-type="bibr" rid="scirp.52556-ref1">1</xref>];</p><p>C2=COEF [<xref ref-type="bibr" rid="scirp.52556-ref2">2</xref>];</p><p>C3=COEF [<xref ref-type="bibr" rid="scirp.52556-ref3">3</xref>];</p><p>F= (C1**2+6*C1*C3+2*C2**2+15*C3**2-1)//</p><p>(2*C2*(C1**2+24*C1*C3 +105*C3**2+2)-SKEWNESS)//</p><p>(24*(C1*C3+C2**2*(1+C1**2+28*C1*C3)+C3**2*</p><p>(12+48*C1*C3+141*C2**2+225*C3**2))-KURTOSIS);</p><p>FINISH FUN;</p><p>START DERIV;</p><p>J= ((2*C1+6*C3) || (4*C2) || (6*C1+30*C3))//</p><p>((4*C2*(C1+12*C3)) || (2*(C1**2+24*C1*C3+105*C3**2+2))||</p><p>(4*C2*(12*C1+05*C3)))//((24*(C3+C2**2*(2*C1+28*C3)+48*C3**3))||</p><p>(48*C2*(1+C1**2+28*C1*C3+141*C3**2))||</p><p>(24*(C1+28*C1*C2**2+2*C3*(12+48*C1*C3+141*C2**2+225*C3**2)</p><p>+C3**2*(48*C1+450*C3))));</p><p>FINISH DERIV;</p><p>START NEWTON;</p><p>RUN FUN;</p><p>DO ITER = 1 TO MAXITER</p><p>WHILE (MAX(ABS(F))&gt; CONVERGE);</p><p>RUN DERIV;</p><p>DELTA=-SOLVE(J,F);</p><p>COEF=COEF+DELTA;</p><p>RUN FUN;</p><p>END;</p><p>FINISH NEWTON;</p><p>DO;</p><p>NUM=NROW (SKEWKURT);</p><p>DO VAR = 1 TO NUM;</p><p>SKEWNESS = SKEWKURT [VAR,1];</p><p>KURTOSIS = SKEWKURT [VAR, 2];</p><p>COEF = {1.0, 0.0, 0.0};</p><p>RUN NEWTON;</p><p>COEF = COEF`;</p><p>SK_KUR= SKEWKURT [VAR,];</p><p>COMBINE=SK_KUR||COEF;</p><p>IF VAR = 1 THEN RESULT=COMBINE;</p><p>IF VAR &gt;1 THEN RESULT=RESULT//COMBINE;</p><p>END;</p><p>END;</p><p>RESULT=RESULT`;</p><p>CREATE _COEF_ FROM RESULT [COLNAME={&amp;LIST_VARNAME}];</p><p>APPEND FROM RESULT;</p><p>DATA _COEF;</p><p>SET _COEF_;</p><p>LENGTH _TYPE_ $40;</p><p>MARK = _N_; _TYPE_=&quot;COEFF&quot;;</p><p>FORMAT MARK;</p><p>RUN;</p><p>DATA COEFF (DROP= MARK);</p><p>SET _COEF;</p><p>IF MARK &gt;2 THEN OUTPUT COEFF;</p><p>RUN;</p><p>%MEND COEFF;</p><p>/***********************%CONTINUOUS********************************/</p><p>/*This program generates random variables following a Multivariate Normal /*distribution with given name, standard deviation, and correlation matrix, and then /*transforms each variable to the desired distributional shape with Fleishman’s /*coefficient.</p><p>/*Parameter</p><p>/*N_Repeat; the number of iterations</p><p>/*SEED; seed of the random number generator</p><p>/*DATA the name, A, of the input file that determines the characteristics of the random /*numbers to be generated. The file specifies the mean, standard deviation, number of /*observations of each random number, and the correlation coefficients between the /*variables. It must be a TYPE=CORR file, and its structure must comply with that of /*such files. The file has _Type_=MEAN, STD, N, CORR. Its variables are _TYPE_, /*_NAME_ and the variables to be generated. The number of observations should be /*the same value. In this file, the sample size 'NSP' should be specified as a parameter, /*using IF TRIM(LEFT(_TYPE_))='N' THEN CALL SYMPUT('NSP', X1 (one of the /*variable names)).</p><p>/*</p><p>/*Example</p><p>/*DATA A (TYPE=CORR);</p><p>/*LENGTH _TYPE_ $40;</p><p>/* INPUT _NAME_ $ _TYPE_$ X1 X2 ;</p><p>/* IF TRIM(LEFT(_TYPE_))='N' THEN CALL SYMPUT('NSP', X1);</p><p>/* CARDS;</p><p>/* . MEAN 70 160</p><p>/* . STD 8 25</p><p>/* . N 300 300</p><p>/* X1 CORR 1 0</p><p>/* X2 CORR 0 1</p><p>/* ;</p><p>/* RUN;</p><p>/*OUT random variables generated according to the file given in parameter DATA and observation identification number (ID)</p><p>/***************************************************************/</p><p>%MACROCONTINUOUS;</p><p>PROC CONTENTS DATA=A (DROP=_TYPE_ _NAME_)</p><p>OUT=_DATA_ (KEEP=NAME) NOPRINT;</p><p>RUN;</p><p>/*SUPPOSE WE HAVE X1,......, X<sub>P</sub> VARIABLE IN DATASET A WHICH IS AN INPUT DATASET.</p><p>WE ASSIGN THESE VARIABLES AS NAME OF V1,..., V<sub>P</sub> MACRO REFERENCE OF &amp;NV IS ASSIGNED THE NUMBER OF TOTAL VARIABLES*/</p><p>DATA _DATA_;</p><p>SET _LAST_ END=END;</p><p>RETAIN N 0;</p><p>N=N+1;</p><p>V=COMPRESS('V' || COMPRESS(PUT (N, 6.0)));</p><p>CALL SYMPUT(V, NAME);</p><p>IF END THEN CALL SYMPUT('NV', LEFT(PUT (N, 6.0)));</p><p>RUN;</p><p>%LET VNAMES=&amp;V1;</p><p>%DO I=2%TO&amp;NV.;</p><p>%LET VNAMES=&amp;VNAMES &amp;&amp;V&amp;I;</p><p>%END;</p><p>/*OBTAIN THE MATRIX OF FACTOR PATTERNS AND OTHER STATISTICS.*/</p><p>PROC FACTOR DATA=A NFACT=&amp;NV NOPRINT</p><p>OUTSTAT=PATTERN_(WHERE=(_TYPE_ IN('MEAN','STD','N','PATTERN')));</p><p>RUN;</p><p>DATA _PATTERN_;</p><p>SET COEFF PATTERN_;</p><p>RUN;</p><p>/*GENERATE THE RANDOM NUMBERS.*/</p><p>%LET NV2=%EVAL(&amp;NV.*&amp;NV.);</p><p>%LET NV3=%EVAL(3*&amp;NV.);</p><p>DATA B&amp;REPEAT. (KEEP=&amp;VNAMES);</p><p>SET _PATTERN_ (KEEP=&amp;VNAMES _TYPE_ RENAME=(</p><p>%DO I=1%TO&amp;NV;</p><p>&amp;&amp;V&amp;I = V&amp;I</p><p>%END;</p><p>)) END=LASTFACT;</p><p>RETAIN;</p><p>/*SET UP ARRAYS TO STORE THE NESSESARRY STATISTICS.*/</p><p>ARRAY VCOEFF(3,&amp;NV) C1-C&amp;NV3;</p><p>ARRAY FPATTERN(&amp;NV,&amp;NV) F1-F&amp;NV2;</p><p>ARRAY VSTD(&amp;NV) S1-S&amp;NV;</p><p>ARRAY VMEAN(&amp;NV) M1-M&amp;NV;</p><p>ARRAY V(&amp;NV)V1-V&amp;NV;</p><p>ARRAY VTEMP(&amp;NV)VT1-VT&amp;NV;</p><p>LENGTH LBL $40;</p><p>/* READ AND STORE THE MATORIX OF FACTOR PATTERNS. */</p><p>IF _TYPE_='PATTERN' THEN DO;</p><p>DO I=1 TO &amp;NV;</p><p>FPATTERN(_N_ -6, I)=V(I);</p><p>END;</p><p>END;</p><p>IF _TYPE_='COEFF' THEN DO;</p><p>DO I=1 TO &amp;NV;</p><p>VCOEFF(_N_,I) =V(I);</p><p>END;</p><p>END;</p><p>/* READ AND STORE THE MEANS */</p><p>IF _TYPE_ = 'MEAN' THEN DO;</p><p>DO I=1 TO &amp;NV;</p><p>VMEAN(I)=V(I); END;</p><p>END;</p><p>/* READ AND STORE THE STD. */</p><p>IF _TYPE_ = 'STD' THEN DO;</p><p>DO I=1 TO &amp;NV;</p><p>VSTD(I) =V(I); END;</p><p>END;</p><p>/* READ AND STORE THE NUMBER OF OBSERVATIONS.*/</p><p>IF _TYPE_ = 'N' THEN NNUMBERS=V(1);</p><p>IF LASTFACT THEN DO;</p><p>/* SET UP LABELS FOR THE RANDOM VARIABLES. THE LABELSARE STORED IN MACRO VARIABLES LBL1, LBL2,.... AND USED IN THE SUBSEQUENT PROC DATASETS.*/</p><p>%DO I=1%TO&amp;NV;</p><p>LBL=&quot;ST.NORMAL VAR. ,M-&quot;||COMPRESS(PUT(VMEAN(&amp;I), BEST8.))||&quot;,STD=&quot;||COMPRESS(PUT(VSTD(&amp;I),BEST8.));</p><p>CALL SYMPUT(&quot;LBL&amp;I&quot;,LBL);</p><p>%END;</p><p>DO K=1 TO NNUMBERS;</p><p>DO I =1 TO &amp;NV;</p><p>SEED=(&amp;SEED.+&amp;REPEAT.+1);</p><p>VTEMP(I)=RANNOR(SEED);</p><p>END;</p><p>/* IMPOSE THE INTERCORRELATION ON EACH VARIABLE. THE</p><p>TRANSFORMED VARIABLES ARE STORED ARRAY 'V'.*/</p><p>DO I=1 TO &amp;NV;</p><p>V(I)=0;</p><p>DO J=1 TO &amp;NV;</p><p>V(I) = V(I) + VTEMP(J)*FPATTERN(J, I);</p><p>END;</p><p>END;</p><p>/* TRANSFORM THE RANDOM VARIABLES SO THEY HAVE</p><p>MEANS AND STANDARD DEVIATIONS AS REQUESTED. */</p><p>DO I=1 TO &amp;NV;</p><p>V(I)= VCOEFF(2, I)*(-1)+V(I)*VCOEFF(1, I) +VCOEFF(2,I)*V(I)*V(I)+VCOEFF(3, I)*V(I)*V(I)*V(I);</p><p>V(I) = VSTD(I) *V(I) + VMEAN(I);</p><p>END;</p><p>OUTPUT;</p><p>END;</p><p>END;</p><p>RENAME</p><p>%DO I=1%TO&amp;NV;</p><p>V&amp;I = &amp;&amp;V&amp;I</p><p>%END;</p><p>;</p><p>RUN;</p><p>DATA BB&amp;REPEAT.;</p><p>SET B&amp;REPEAT.;</p><p>ID=_N_;</p><p>FORMAT ID;</p><p>RUN;</p><p>%MEND CONTINUOUS;</p><p>/***************** HISTGRAM and RATIO *************************/</p><p>/* This program generate random variables as in <xref ref-type="fig" rid="fig1">Figure 1</xref> based on a Uniform /*distribution. % macro RATIO is executed in % macro HISTGRAM.</p><p>/*%Macro HISTGRAM</p><p>/*Parameters</p><p>/*SEED= seed of the random number</p><p>/*N_REPEAT= the number of iterations</p><p>/*NSP= total sample size, which is already defined as an input parameter of the DATA /*file used to execute %macro CONTINUOUS.</p><p>/*MAX=Maximum value of an original variable</p><p>/*MIN=Minimum value of an original variable</p><p>/*SUB_GROUP= the number of subgroups</p><p>/*OUT</p><p>/*The name of the output file (name of HH&amp;REPEAT) containing the random variable, /*H1 and ID number.</p><p>/*</p><p>/*%RATIO assigns the frequencies of each subgroup, using an IF function.</p><p>/*&amp;NSP.*0.55 = (sample size &#215; accumulated percentage)=frequencies of subgroup.</p><p>/*Example IF 1=&lt; ID &lt;&amp;NSP.*0.55 THEN _H1=U1;</p><p>/*U1 indicates random number for the lowest subgroup.</p><p>/**************************************************************/</p><p>%MACROHISTGRAM;</p><p>DATA HH&amp;REPEAT.;</p><p>DO ID=1 TO &amp;NSP.;</p><p>CALL STREAMINIT(&amp;REPEAT. + &amp;NSP. + &amp;SEED. + 2.);</p><p>SCALE=%EVAL((&amp;MAX.-&amp;MIN.)/&amp;SUB_GROUP.);</p><p>%DO I=1%TO&amp;SUB_GROUP.;</p><p>_U&amp;I.=RAND(&quot;UNIFORM&quot;);</p><p>U&amp;I.=&amp;MIN.+(&amp;I.-1)*SCALE+SCALE*_U&amp;I.;</p><p>%END;</p><p>%RATIO;</p><p>OUTPUT;</p><p>END;</p><p>RUN;</p><p>%MEND HISTGRAM;</p><p>%MACRORATIO;</p><p>IF 1=</p><p>ELSE IF &amp;NSP.*0.55 =</p><p>ELSE IF &amp;NSP.*0.6 =</p><p>ELSE IF &amp;NSP.*0.8=</p><p>ELSE _H1=U5;</p><p>H1=INT(_H1);</p><p>IF H1=&lt;4 THEN C1=0;</p><p>ELSE IF 4&lt; H1 =&lt;15 THEN C1=1;</p><p>ELSE IF H1 &gt;15 THEN C1=2;</p><p>%MEND RATIO;</p><p>/************************** PDF ***********************************/</p><p>/*This macro generates random variables from the RAND function.</p><p>/*RAND function generates random numbers with certain probability distributions.</p><p>/*Parameter</p><p>/*SEED= seed of the random number</p><p>/*N_REPEAT= the number of iterations</p><p>/*NSP= total sample size, which is already defined as an input parameter of the DATA file used to execute %macro CONTINUOUS.</p><p>/*RAND function.</p><p>/*OUT</p><p>/*The name of the output file (name of CC&amp;REPEAT) containing the random variable /*defined by the probability distributions given by the RAND function and ID number.</p><p>/**************************************************************/</p><p>%MACROPDF;</p><p>DATA CC&amp;REPEAT.;</p><p>DO ID=1 TO &amp;NSP.;</p><p>CALL STREAMINIT( &amp;REPEAT. + &amp;NSP. + &amp;SEED. + 3);</p><p>STRATA=&amp;REPEAT.;</p><p>UN=RAND(&quot;UNIFORM&quot;);</p><p>/*INSERT　RAND　FUNCTION TO GENERATE RANDOM NUMBER USING RAND FUNCTION*/</p><p>X=RAND(&quot;NORMAL&quot;, 0,1);</p><p>_D1=RAND(&quot;TABLE&quot;, 0.5 , 0.5);</p><p>IF _D1=1 THEN D1=0;</p><p>ELSE IF _D1=2 THEN D1=1;</p><p>OUTPUT;</p><p>END;</p><p>%MEND PDF;</p><p>/********************** Merge ***************************/</p><p>/*This program merges all datasets including randomly generated variables specified /*in %macro CONTINUOUS (BB&amp;REPEAT), %HISTGRAM (HH&amp;REPEAT) /*and %macro PDF(CC&amp;REPEAT) by ID number.</p><p>/*OUT file name of _D&amp;REPEAT.</p><p>/**************************************************************/</p><p>%MACRO MERGE;</p><p>PROC SORT DATA=BB&amp;REPEAT.; BY ID; RUN;</p><p>PROC SORT DATA=HH&amp;REPEAT.; BY ID; RUN;</p><p>PROC SORT DATA=CC&amp;REPEAT.; BY ID; RUN;</p><p>DATA _D&amp;REPEAT.;</p><p>MERGE BB&amp;REPEAT. CC&amp;REPEAT. HH&amp;REPEAT.;</p><p>BY ID;</p><p>RUN;</p><p>DATA DATASET0;</p><p>SET DATASET0;</p><p>DATA DATASET&amp;REPEAT.;</p><p>SET DATASET%EVAL(&amp;REPEAT. -1) _D&amp;REPEAT.;</p><p>RUN;</p><p>%MEND MERGE;</p><p>/*************************** OUTCOME ***************/</p><p>/*This program generates outcome variable, y, from the individual probability of event /*occurrence. Individual probability is calculated using two segment logistic regression /*model.</p><p>/*Parameter</p><p>/*CHANGE_POINT= flexion point of two segment logistic regression model</p><p>/*MODEL_1 = logistic regression model when values of covariates ?values of /*CHANGE_POINT</p><p>/*MODEL_2 = logistic regression model when values of covariates &gt; values of /*CHANGE_POINT</p><p>/*NITER= number of final datasets</p><p>/*P = event proportion</p><p>/*OUT DATASET For logistic regression model</p><p>/********************************************************************/</p><p>%MACROOUTCOME;</p><p>DATA _D_&amp;NITER.;</p><p>SET DATASET&amp;NITER.;</p><p>%IF H1 =&lt;&amp;CHANGE_POINT.%THEN%DO;</p><p>G=&amp;MODEL_1;</p><p>%END;</p><p>%IF H1 &gt;&amp;CHANGE_POINT.%THEN%DO;</p><p>G=&amp;MODEL_2;</p><p>%END;</p><p>RUN;</p><p>PROC SUMMARY DATA=_D_&amp;NITER.;</p><p>VAR G;</p><p>OUTPUT OUT= PROCMEAN&amp;NITER. MEAN=;</p><p>RUN;</p><p>DATA M&amp;NITER. (KEEP=INT ID NITER);</p><p>SET PROCMEAN&amp;NITER.;</p><p>DO ID=1 TO &amp;NSP;</p><p>MEAN=%SCAN(G, 1);</p><p>INT=LOG(&amp;P/(1-&amp;P))-MEAN;</p><p>NITER=&amp;NITER.;</p><p>OUTPUT;</p><p>END;</p><p>RUN;</p><p>PROC SORT DATA=M&amp;NITER.;BY ID;RUN;</p><p>PROC SORT DATA=_D_&amp;NITER.;BY ID;RUN;</p><p>DATA D_&amp;NITER.;</p><p>MERGE M&amp;NITER. _D_&amp;NITER.;</p><p>BY ID;</p><p>RUN;</p><p>DATA D&amp;NITER. ;</p><p>SET D_&amp;NITER.;</p><p>PRO=EXP(INT+ G)/(1 +EXP(INT+ G));</p><p>IF 0=</p><p>ELSE Y=0;</p><p>RUN;</p><p>PROC SORT DATA=D&amp;NITER. ;BY STRATA;RUN;</p><p>%MEND OUTCOME;</p><p>/********************************************************************/</p><p>/*This program performs a stratified continuous logistic regression model and produces a repeated number of parameters (coefficient, its standard error, and p value), then calculates the average coefficient value, average standard error, and power.</p><p>/*Parameter</p><p>/*NITER=number of final datasets</p><p>/*CONTI_MODEL=a continuous logistic regression model</p><p>/*ALEVEL=significance level of the statistical test (Type I error)</p><p>/*NITER=specify final dataset</p><p>/*PATH=directory in which results are saved</p><p>/*TABLE=table name for saved results</p><p>/*OUT=Result (excel format)</p><p>/*Results include event proportion, mean, standard deviation, skewness, and kurtosis of a variable average coefficient and average standard error of logistic regression model and power</p><p>/********************************************************************/</p><p>%MACROCONTINUOUSLR;</p><p>/****MODEL**********************************************************/</p><p>ODS OUTPUT PARAMETERESTIMATES=PARAM CONVERGENCESTATUS=STATUS;</p><p>PROC LOGISTIC DATA=D&amp;NITER. ;</p><p>MODEL Y (EVENT='1')=&amp;CONTI_MODEL</p><p>/TECH=NR MAXITER=8 XCONV=0.01 ;</p><p>BY STRATA;</p><p>RUN;</p><p>/******************************************************************/</p><p>PROC SORT DATA=PARAM</p><p>OUT=PARAM2</p><p>(RENAME=(ESTIMATE=ESTIMATION STDERR=STANDARDERRORS));</p><p>BY STRATA;</p><p>RUN;</p><p>PROC SORT DATA=STATUS;</p><p>BY STRATA;</p><p>RUN;</p><p>DATA RESULT;</p><p>MERGE PARAM2 STATUS ;</p><p>BY STRATA;</p><p>RUN;</p><p>DATA RESULT_CONTINUOUS E;</p><p>SET RESULT;</p><p>IF 0=&lt; PROBCHISQ&lt;&amp;ALEVEL. THEN POWER=1;</p><p>ELSE POWER=0;</p><p>IF STATUS=0 THEN OUTPUT RESULT_CONTINUOUS;</p><p>ELSE OUTPUT E;</p><p>RUN;</p><p>ODS HTML PATH=&quot;&amp;PATH&quot; BODY=&quot;&amp;TABLE..XLS&quot;;</p><p>PROC TABULATE DATA=D&amp;NITER. OUT=J;</p><p>VAR Y &amp;CONTI_MODEL;</p><p>TABLE (&amp;CONTI_MODEL)*(MEAN STD SKEWNESS KURTOSIS)/MISSTEXT = 'NO DATA';</p><p>RUN;</p><p>PROC FREQ DATA=D&amp;NITER;</p><p>TITLE 'PROPORTION';</p><p>TABLE Y/NOCOL NOROW;</p><p>RUN;</p><p>PROC FREQ DATA=RESULT_CONTINUOUS;</p><p>TITLE 'POWER';</p><p>TABLE VARIABLE*(POWER)/NOCOL NOPERCENT;</p><p>RUN;</p><p>PROC TABULATE DATA=RESULT_CONTINUOUS;</p><p>TITLE 'MEAN OF COEFFICIENT AND THEIR MEAN OF STANDARD ERROR';</p><p>CLASS VARIABLE ;</p><p>VAR ESTIMATION STANDARDERRORS;</p><p>TABLE VARIABLE,(ESTIMATION STANDARDERRORS)*(N MEAN*F=8.4)/MISSTEXT = 'NO DATA';</p><p>RUN;</p><p>ODS HTML CLOSE;</p><p>%MEND CONTINUOUSLR;</p><p>/************************* CONTINUOUSLR ***************************/</p><p>/*Continuous variables are divided into categorical groups by quantile, and then a stratified logistic regression model is executed. Users specify the model in %MACRO CATEGORICAL_MODEL. Then, parameters (coefficient, standard error, and p value) and average coefficient values, average standard error of each group of a variable are calculated, and the power is calculated.</p><p>/*Parameter</p><p>/*R=Number of categorized groups</p><p>/*Example: Continuous=1, median=2, tertile=3, quantile=4,</p><p>/*CATEGORIZATION=%NRSTR(List of covariates to be categorized)</p><p>/*CATEGORIZATION_R=%NRSTR(List of new covariate names after /*categorization)</p><p>/*NITER=number of final datasets</p><p>/*CONTI_MODEL=a continuous logistic regression model</p><p>/*ALEVEL=significance level of the statistical test (Type I error)</p><p>/*NITER= specify final dataset</p><p>/*PATH=directory in which results are saved</p><p>/*TABLE=table name for saved results</p><p>/*OUT= Result (excel format)</p><p>/*Results include average coefficient and average standard error of logistic regression model and power for each categorized group and overall power of a variable.</p><p>/********************************************************************/</p><p>%MACROCATEGORICAL_MODEL;</p><p>ODS OUTPUT PARAMETERESTIMATES=PARAM_&amp;R CONVERGENCESTATUS=STATUS_&amp;R TYPE3=TYPE3_&amp;R.;</p><p>PROC LOGISTIC DATA=G&amp;R.;</p><p>CLASS C1(PARAM=REF REF=&quot;0&quot;) X1_R(PARAM=REF REF=&quot;0&quot;) ;</p><p>MODEL Y(EVENT='1')= C1 X1_R X2 /TECH=NR MAXITER=8 XCONV=0.01;</p><p>BY STRATA;</p><p>RUN;</p><p>%MEND CATEGORICAL_MODEL;</p><p>%MACROCATEGORICALLR;</p><p>PROC RANK DATA= D&amp;NITER. GROUPS=&amp;R. OUT=G&amp;R.;</p><p>VAR &amp;CATEGORIZATION;</p><p>RANKS &amp;CATEGORIZATION_R ;</p><p>BY STRATA ;</p><p>RUN;</p><p>%CATEGORICAL_MODEL;</p><p>PROC SORT DATA=PARAM_&amp;R</p><p>OUT=P_&amp;R (RENAME=( ESTIMATE=ESTIMATION STDERR=STANDARDERRORS));</p><p>BY STRATA;</p><p>RUN;</p><p>PROC SORT DATA=STATUS_&amp;R OUT =S_&amp;R(KEEP = STRATA STATUS) ;</p><p>BY STRATA;</p><p>RUN;</p><p>PROC SORT DATA=TYPE3_&amp;R OUT =_TYPE3_&amp;R;</p><p>BY STRATA;</p><p>RUN;</p><p>DATA _POWER_&amp;R;</p><p>MERGE _TYPE3_&amp;R S_&amp;R;</p><p>BY STRATA;</p><p>RUN;</p><p>DATA POWER_&amp;R E_&amp;R;</p><p>SET _POWER_&amp;R;</p><p>RENAME;</p><p>TYPE3_WALDCHISQ=WALDCHISQ;</p><p>TYPE3_PROBCHISQ=PROBCHISQ;</p><p>LABEL TYPE3_PROBCHISQ=&quot;P VALUE OF TYPE3&quot;;</p><p>TYPE3_WALDCHISQ=&quot;CHISQ OF TYPE 3&quot;;</p><p>IF 0 =&lt; TYPE3_PROBCHISQ &lt;&amp;ALEVEL. THEN POWER=1;</p><p>ELSE IF TYPE3_PROBCHISQ &gt;= &amp;ALEVEL. THEN POWER=0;</p><p>IF STATUS=0 THEN OUTPUT POWER_&amp;R;</p><p>ELSE OUTPUT E_&amp;R;</p><p>KEEP POWER STATUS TYPE3_WALDCHISQ TYPE3_PROBCHISQ EFFECT;</p><p>RUN;</p><p>DATA _RESULT_CATEGORICAL_&amp;R;</p><p>MERGE P_&amp;R S_&amp;R;</p><p>BY STRATA;</p><p>RUN;</p><p>DATA RESULT_CATEGORICAL_&amp;R E_&amp;R;</p><p>SET _RESULT_CATEGORICAL_&amp;R;</p><p>IF CLASSVAL0=. THEN CLASSLEVEL=0;</p><p>ELSE CLASSLEVEL=CLASSVAL0;</p><p>IF 0=&lt; PROBCHISQ&lt;&amp;ALEVEL. THEN GROUP_POWER=1;</p><p>ELSE IF PROBCHISQ&gt;= &amp;ALEVEL. THEN GROUP_POWER=0;</p><p>DROP CLASSVAL0;</p><p>IF STATUS=0 THEN OUTPUT RESULT_CATEGORICAL_&amp;R;</p><p>ELSE OUTPUT E_&amp;R;</p><p>RUN;</p><p>ODS HTML PATH=&quot;&amp;PATH&quot; BODY=&quot;GROUP_&amp;R._&amp;TABLE..XLS&quot;;</p><p>PROC FREQ DATA=POWER_&amp;R;</p><p>TITLE 'POWER OF Β';</p><p>TABLE EFFECT*(POWER)/NOCOL NOPERCENT;</p><p>RUN;</p><p>PROC TABULATE DATA= RESULT_CATEGORICAL_&amp;R;</p><p>CLASS VARIABLE CLASSLEVEL;</p><p>VAR ESTIMATION STANDARDERRORS;</p><p>TABLE VARIABLE*CLASSLEVEL,(ESTIMATION STANDARDERRORS ) *(N MEAN*F=8.4) /RTS=20 MISSTEXT = 'NO DATA';</p><p>RUN;</p><p>PROC TABULATE DATA= RESULT_CATEGORICAL_&amp;R;</p><p>CLASS VARIABLE CLASSLEVEL GROUP_POWER;</p><p>TABLE VARIABLE*CLASSLEVEL,( GROUP_POWER )*(N ROWPCTN) /RTS=20 MISSTEXT = 'NO DATA';</p><p>RUN;</p><p>ODS HTML CLOSE;</p><p>%MEND CATEGORICALLR;</p><p>/********************************************************************/</p><p>Dataset generation</p><p>One dataset is created from each iteration of %COEFF, %CONTINUOUS, %HISTGRAM, %PDF, and %MERGE.</p><p>This dataset is accumulated until the iterations are complete and the iteration time is identified as strata.</p><p>/********************************************************************/</p><p>%MACRO DATASET(N_REPEAT=);</p><p>%COEFF;</p><p>%DO REPEAT=1%TO&amp;N_REPEAT.;</p><p>%CONTINUOUS;</p><p>%HISTGRAM</p><p>%PDF;</p><p>%MERGE;</p><p>%END;</p><p>%OUTCOME;</p><p>%MEND DATASET;</p><p>/********************************************************************/</p><p>The parameter estimations and statistical power calculation.</p><p>/********************************************************************/</p><p>%MACRO LR(R=);</p><p>%DO R=1%TO&amp;R;</p><p>%IF&amp;R=1%THEN%DO;</p><p>%CONTINUOUSLR;</p><p>%END;</p><p>%ELSE%DO;</p><p>%CATEGORICALLR;</p><p>%END;</p><p>%END;QUIT;</p><p>%MEND LR;</p></sec><sec id="s10"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.52556-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Ottenbacher, K.J., Ottenbacher, H.R., Tooth, L. and Ostir, G.V. (2004) A Review of Two Journals Found That Articles Using Multivariable Logistic Regression Frequently Did Not Report Commonly Recommended Assumptions. Journal of Clinical Epidemiology, 57, 1147-1152. http://dx.doi.org/10.1016/j.jclinepi.2003.05.003</mixed-citation></ref><ref id="scirp.52556-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Brenner, H. and Blettner, M. (1997) Controlling for Continuous Confounders in Epidemiologic Research. Epidemiology, 8, 429-434. http://dx.doi.org/10.1097/00001648-199707000-00014</mixed-citation></ref><ref id="scirp.52556-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Andrici, J., Cox, M.R. and Eslick, G.D. (2013) Cigarette Smoking and the Risk of Barrett’s Esophagus. A Systematic Review and Meta-Analysis. Journal of Gastroenterology and Hepatology, 28, 1258-1273. http://dx.doi.org/10.1111/jgh.12230</mixed-citation></ref><ref id="scirp.52556-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Bergtold, J., Yeager, E. and Featherstone, A. (2011) Sample Size and Robustness of Inferences from Logistic Regression in the Presence of Nonlinearity and Multicollinearity. The Agricultural &amp; Applied Economics Association’s 2011 AAEA &amp; NAREA Joint Annual Meeting, Pittsburgh, Pennsylvania, 24-26 July 2011.</mixed-citation></ref><ref id="scirp.52556-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Demidenko, E. (2007) Sample Size Determination for Logistic Regression Revisited. Statistics in Medicine, 26, 3385-3397. http://dx.doi.org/10.1002/sim.2771</mixed-citation></ref><ref id="scirp.52556-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Whittemore, A.S. (1981) Sample Size for Logistic Regression with Small Response Probability. Journal of the American Statistical Association, 76, 27-32. http://dx.doi.org/10.1080/01621459.1981.10477597</mixed-citation></ref><ref id="scirp.52556-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Hsieh, F.Y., Bloch, D.A. and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Statistics in Medicine, 17, 1623-1634. http://dx.doi.org/10.1002/(SICI)1097-0258(19980730)17:14&lt;1623::AID-SIM871&gt;3.0.CO;2-S</mixed-citation></ref><ref id="scirp.52556-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Peduzzi, P., Concato, J., Kemper, E., Holford, T.R. and Feinstein, A.R. (1996) A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis. Journal of Clinical Epidemiology, 49, 1373-1379. http://dx.doi.org/10.1016/S0895-4356(96)00236-3</mixed-citation></ref><ref id="scirp.52556-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Vittinghoff, E. and McCulloch, C.E. (2007) Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression. American Journal of Epidemiology, 165, 710-718. http://dx.doi.org/10.1093/aje/kwk052</mixed-citation></ref><ref id="scirp.52556-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">SAS/STAT(R) 9.2 User’s Guide, Second Edition.</mixed-citation></ref><ref id="scirp.52556-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Hosmer, D.W. and Lemeshow, S. (2000) Applied Logistic Regression. 2nd Edition, John Wiley &amp; Sons, New York. http://dx.doi.org/10.1002/0471722146</mixed-citation></ref><ref id="scirp.52556-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Messerli, F.H. and Panjrath, G.S. (2009) The J-Curve between Blood Pressure and Coronary Artery Disease or Essential Hypertension: Exactly How Essential? Journal of the American College of Cardiology, 54, 1827-1834. http://dx.doi.org/10.1016/j.jacc.2009.05.073</mixed-citation></ref><ref id="scirp.52556-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Kumagai, N., Okuhara, Y., Iiyama, T., Fujimoto, Y., Takekawa, H., Origasa, H., Kawanishi, Y. and Yamaguchi, T. (2013) Effects of Smoking on Outcomes after Acute Atherothrombotic Stroke in Japanese Men. Journal of the Neurological Sciences, 335, 164-1168. http://dx.doi.org/10.1016/j.jns.2013.09.023</mixed-citation></ref><ref id="scirp.52556-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Hsieh, F.Y. (1989) Sample Size Tables for Logistic Regression. Statistics in Medicine, 8, 795-802. http://dx.doi.org/10.1002/sim.4780080704</mixed-citation></ref><ref id="scirp.52556-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Fan, X., Felsovalyi, A., Sivo, S.A. and Keenan, S.C. (2003) SAS&amp;reg; for Monte Carlo Studies: A Guide for Quantitative Researchers. SAS Institute, Cary.</mixed-citation></ref><ref id="scirp.52556-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Webb, M.C., Wilson, J.R. and Chong, J. (2004) An Analysis of Quasi-Complete Binary Data with Logistic Models: Applications to Alcohol Abuse Data. Journal of Data Science, 2, 273-285.</mixed-citation></ref><ref id="scirp.52556-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Arnold, B.F., Hogan, D.R., Colford Jr., J.M. and Hubbard, A.E. (2011) Simulation Methods to Estimate Design Power: An Overview for Applied Research. BMC Medical Research Methodology, 11, 94. http://dx.doi.org/10.1186/1471-2288-11-94</mixed-citation></ref><ref id="scirp.52556-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Royston, P. and Sauerbrei, W. (2005) Building Multivariable Regression Models with Continuous Covariates in Clinical Epidemiology—With an Emphasis on Fractional Polynomials. Methods of Information in Medicine, 44, 561-571.</mixed-citation></ref><ref id="scirp.52556-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Grund, B. and Sabin, C. (2010) Analysis of Biomarker Data: Logs, Odds Ratios, and Receiver Operating Characteristic Curves. Current Opinion in HIV &amp; AIDS, 5, 473-479. http://dx.doi.org/10.1097/COH.0b013e32833ed742</mixed-citation></ref><ref id="scirp.52556-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Li, A. (2013) Handbook of SAS&amp;reg; DATA Step Programming. Chapman and Hall &amp; CRC, London.</mixed-citation></ref><ref id="scirp.52556-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Burlew, M.M. (2007) SAS Macro Programming Made Easy. SAS Institute, Cary.</mixed-citation></ref></ref-list></back></article>