<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JDAIP</journal-id><journal-title-group><journal-title>Journal of Data Analysis and Information Processing</journal-title></journal-title-group><issn pub-type="epub">2327-7211</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jdaip.2023.111001</article-id><article-id pub-id-type="publisher-id">JDAIP-122494</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject><subject> Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Heart Disease Prediction Using Machine Learning Algorithms with Self-Measurable Physical Condition Indicators
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Huating</surname><given-names>Sun</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jianan</surname><given-names>Pan</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou, China</addr-line></aff><aff id="aff1"><addr-line>Department of Geography, University of Washington, Seattle, USA</addr-line></aff><pub-date pub-type="epub"><day>18</day><month>01</month><year>2023</year></pub-date><volume>11</volume><issue>01</issue><fpage>1</fpage><lpage>10</lpage><history><date date-type="received"><day>10,</day>	<month>October</month>	<year>2022</year></date><date date-type="rev-recd"><day>15,</day>	<month>January</month>	<year>2023</year>	</date><date date-type="accepted"><day>18,</day>	<month>January</month>	<year>2023</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  In recent years, the number of cases of heart disease has been greatly increasing, and heart disease is associated with a high mortality rate. Moreover, with the development of technologies, some advanced types of equipment were invented to help patients measure health conditions at home and predict the risks of having heart disease. The research aims to find the accuracy of self-measurable physical health indicators compared to all indicators measured by healthcare providers in predicting heart disease using five machine learning 
  models. Five models were used to predict heart disease, including Logistics Regression, K Nearest Neighbors, Support Vector Model, Decision tree, and 
  Random Forest. The database used for the research contains 13 types of health test results and the risks of having heart disease for 303 patients. All matrices consisted of all 13 test results, while the home matrices included 6 results that could test at home. After constructing five models for both the home matrices and all matrices, the accuracy score and false negative rate were computed for every five models. The results showed all matrices had higher accuracy scores than home matrices in all five models. The false negative rates were lower or equal for all matrices than home matrices for five machine learning models.
   
  The conclusion was drawn from the results that home-measured physica
  l health indicators were less accurate than all physical indicators in predicting patients’ risk for heart disease. Therefore, without the future development of home-testable indicators, all physical health indicators are preferred in measuring the risk for heart diseases.
 
</p></abstract><kwd-group><kwd>Machine Learning</kwd><kwd> Data Visualization</kwd><kwd> Feature Engineering</kwd><kwd> Health</kwd><kwd> Heart Disease</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Heart disease, caused by abnormal heart and blood vessel conditions, is widely considered a direct threat to human life and health. It is one of the significant diseases exerting irreversible effects on many middle-aged and older people, in which fatal complications are highly likely to result [<xref ref-type="bibr" rid="scirp.122494-ref1">1</xref>]. Makino states that the absolute risk of cardiovascular heart disease is associated with disability and death among people 65 years or older [<xref ref-type="bibr" rid="scirp.122494-ref2">2</xref>]. The World Health Organization (WHO) declared an estimated 17.7 million people died from cardiovascular disorders in 2015, accounting for one-third of all deaths that year [<xref ref-type="bibr" rid="scirp.122494-ref3">3</xref>]. According to the Australian Bureau of Statistics, heart ailment was one of Australia’s two highest causes of mortality [<xref ref-type="bibr" rid="scirp.122494-ref4">4</xref>]. As its extremely negative influence on human health, a great deal of effort has been devoted to the study of the onset of heart disease, trying to prevent and reduce the incidence of heart disease with a timely and efficacious method. Moreover, purpose to prevent the adverse effects of heart disease, it is recommendable to use sophisticated equipment to detect potential heart risks in advance. Currently, qualified health organizations can conduct many tests, including blood tests, echocardiography, chest X-Rays, magnetic resonance imaging (MRI), electrocardiogram, physical examination, and exercise stress test that provide medical doctors with valuable information in their diagnosis and their views on the patient’s heart failure risk level [<xref ref-type="bibr" rid="scirp.122494-ref5">5</xref>].</p><p>There are several risk factors for heart failure, corresponding to different test indexes. A significant amount of relevant research has been carried out to reveal the potential attributes of a heart attack. Sex, age, smoking, hypertension, and diabetes depend on heart disease [<xref ref-type="bibr" rid="scirp.122494-ref6">6</xref>]. Peter et al. [<xref ref-type="bibr" rid="scirp.122494-ref7">7</xref>] suggest that indexes including blood pressure, total cholesterol, and age are essential in predicting coronary heart disease. The effects of sex differences on traditional cardiovascular risk factors are considered to be notable [<xref ref-type="bibr" rid="scirp.122494-ref8">8</xref>]. Heart rate is also a powerful indicator of a patient’s potential heart attack risk [<xref ref-type="bibr" rid="scirp.122494-ref9">9</xref>]. The attributes of heart disease could be approximately divided into two types according to whether the indicators could be measured at home. It is considered worthwhile to compare the accuracy of indicators measured at home with those measured in hospitals, which is useful for future tests of heart disease.</p><p>Computational technology and statistical approach have been popular in discovering the relationship between heart diseases and patients’ health conditions [<xref ref-type="bibr" rid="scirp.122494-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.122494-ref11">11</xref>]. They can help predict the potential risk of heart disease based on the patient’s underlying physical condition in advance, thereby reducing the probability of dying from a heart attack. Many statistical methods based on computer calculation have been applied to predict heart attacks [<xref ref-type="bibr" rid="scirp.122494-ref12">12</xref>]. Due to its high accuracy, SVM has been prevalently applied as a classification method to predict heart attacks [<xref ref-type="bibr" rid="scirp.122494-ref13">13</xref>]. Akkaya used logistic regression and the k-NN algorithm to estimate heart failure and accomplished compromising outcomes [<xref ref-type="bibr" rid="scirp.122494-ref13">13</xref>]. With the adoption of Random Forest, the best accuracy of 82.18% has been achieved by modification of feature selection [<xref ref-type="bibr" rid="scirp.122494-ref14">14</xref>]. These algorithms have been proved to predict the risk of heart disease effectively, which helps researchers and doctors make better judgments about heart disease.</p><p>Although these machine learning technique has been acknowledged and refined continuously to increase the performance of prediction, few investigators has examined and compare the accuracy of home-tested versus in-hospital measures for predicting heart disease risk. Few investigators have examined the relative accuracy of home-tested versus in-hospital measures for predicting heart disease risk. If the indicators measured at home can well predict the patient’s risk of heart disease, then the patient can be tested by themselves or their families instead of having to go to the hospital for testing. Therefore, the innovation of this article lies in that not only did it use five machine learning algorithms to regress data on heart patients, but it also compared the contribution of these algorithms to the prediction of heart disease measured at home and measured in the hospital.</p><p>This study aims to compare the patient’s physical condition indicators measured at home and in the hospital, using 5 different prediction methods to explore their accuracy of heart disease prediction. Moreover, the research question “How is machine learning algorithms’ performance with only self-measurable physical condition indicators compared to algorithms with all physical condition indicators?” would be answered accordingly.</p></sec><sec id="s2"><title>2. Data Description</title><p>We used the data from the Cleveland heart data set from the UCI machine learning repository. The data we selected is made up of 14 variables and 303 instances. Overall speaking, there are 13 variables and 1 categorical response variables (target). Among these variables, numerical variables are age, trtbps, chol, thalach, old peak; Categorical variables are sex, exang, cp, fbs, rest_ecg, slp, thall, target. The table below illuminates the meaning of each variable. Detailed information could be seen in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>From <xref ref-type="fig" rid="fig1">Figure 1</xref> we can see that in the data set, most patients with heart attack are aging between 50 and 60, while only few people have heart failure aged under 30 or above 70. The range of this attribute is 29 - 77, illustrating the wide span of age.</p><p>The Chol means cholestoral of patients, fetched via BMI sensor. According to <xref ref-type="fig" rid="fig2">Figure 2</xref>, it seems that most patients’ cholestoral is around 230 mg/dl and the whole distribution shows a slightly right skewness.</p><p>According to <xref ref-type="fig" rid="fig3">Figure 3</xref>, most maximum heart rates of patients gathers between 140 to 180. Some particular patients have extremely low and high heart rate, specifically lower than 100 and surpassing 200.</p><p>When it comes to resting blood pressure (<xref ref-type="fig" rid="fig4">Figure 4</xref>), a great number of patients have resting blood pressure around 100 to 140. Only a few have abnormal values of around 160 mm/Hg and below 100 mm/Hg.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Variable description</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Variable Name</th><th align="center" valign="middle" >Descriptions</th><th align="center" valign="middle" >Rage of value</th></tr></thead><tr><td align="center" valign="middle" >Age</td><td align="center" valign="middle" >Age of the patient</td><td align="center" valign="middle" >29 - 77</td></tr><tr><td align="center" valign="middle" >Sex</td><td align="center" valign="middle" >Sex of the patient</td><td align="center" valign="middle" >0, 1</td></tr><tr><td align="center" valign="middle" >exang</td><td align="center" valign="middle" >Exercise induced angina: 1 = yes; 0 = no</td><td align="center" valign="middle" >0, 1</td></tr><tr><td align="center" valign="middle" >cp</td><td align="center" valign="middle" >Chest Pain type : 1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 4 = asymptomatic;</td><td align="center" valign="middle" >1, 2, 3, 4</td></tr><tr><td align="center" valign="middle" >trtbps</td><td align="center" valign="middle" >Resting blood pressure (in mm/Hg)</td><td align="center" valign="middle" >94 - 200</td></tr><tr><td align="center" valign="middle" >chol</td><td align="center" valign="middle" >Cholestoral in mg/dl fetched via BMI sensor</td><td align="center" valign="middle" >126 - 564</td></tr><tr><td align="center" valign="middle" >fbs</td><td align="center" valign="middle" >Fasting blood sugar: 1 = fasting blood sugar &gt; 120 mg/dl; 0 = fasting blood sugar ≤ 120 mg/dl</td><td align="center" valign="middle" >0, 1</td></tr><tr><td align="center" valign="middle" >restecg</td><td align="center" valign="middle" >Resting electrocardiographic results: 0 = normal; 1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of &gt; 0.05 mV); 2 = showing probable or definite left ventricular hypertrophy by Estes’ criteria;</td><td align="center" valign="middle" >0, 1, 2</td></tr><tr><td align="center" valign="middle" >thalach</td><td align="center" valign="middle" >Maximum heart rate achieved</td><td align="center" valign="middle" >71 - 202</td></tr><tr><td align="center" valign="middle" >old peak</td><td align="center" valign="middle" >ST depression induced by exercise relative to rest</td><td align="center" valign="middle" >0 - 6.2</td></tr><tr><td align="center" valign="middle" >slp</td><td align="center" valign="middle" >The slope of the peak exercise ST segment</td><td align="center" valign="middle" >0, 1, 2</td></tr><tr><td align="center" valign="middle" >thall</td><td align="center" valign="middle" >Thalassemia: 0 = null; 1 = fixed defect; 2 = normal; 3 = reversable defect</td><td align="center" valign="middle" >0, 1, 2, 3</td></tr><tr><td align="center" valign="middle" >target</td><td align="center" valign="middle" >Output: 0 = less chance of heart attack; 1 = more chance of heart attack</td><td align="center" valign="middle" >0, 1</td></tr></tbody></table></table-wrap></sec><sec id="s3"><title>3. Methodology</title><sec id="s3_1"><title>3.1. Data Processing</title><p>For data description, the research utilized the describe function and pandas profiling in Python to summarize the dataset. The raw data contained 14 variables for 303 patients. Chi-square values, extra-tree classifiers, and correlation matrices were measured to conduct data analysis. The Chi-square values and correlation matrices showed that no variables were highly correlated, and all variables were selected for model building. Moreover, all numerical variables were scaled to normal using Standard Scaler.</p><p>The 13 independent variables were divided into home matrices and all matrices. Home matrices consisted of 6 variables—age, sex, resting blood pressure, cholesterol, fasting blood sugar, and thalassemia. All matrices included all 13 independent variables. The research created the training set and test sets with 80% training data and 20% testing data.</p><p>The helper function was used in Python to show each model’s accuracy score, false negative rate, and confusion matrix. The accuracy score was used to measure the percentage of correctly predicted patients who had or did not have a risk for heart disease. The score showed the accuracy of each model in predicting the correct heart disease risks for patients. The false negative rate measured the percentage of patients with a high risk for heart disease but was mispredicted as having a low risk for heart disease. The false negative rate was significant because misprediction may lead to late treatment for the patients. Those values were used in the final model comparison to conclude the accuracy of self-measured home matrices compared to all matrices.</p></sec><sec id="s3_2"><title>3.2. Machine Learning Algorithms</title><p>The research built five models for both the home matrices and all matrices.</p><sec id="s3_2_1"><title>3.2.1. Logistics Regression</title><p>Logistics Regression is a model for predicting a binary outcome utilizing the observations of a data set. The research selected this model because the output variable is a binary outcome taking either the high risk or no risk for heart disease. The Logistic Regression from the sklearn package in Python was used to build the model. Library for large linear classification was chosen for logistics models because the dataset size was relatively small.</p></sec><sec id="s3_2_2"><title>3.2.2. K-Nearest Neighbors</title><p>K-Nearest Neighbors (KNN) is a classification algorithm that tests the likelihood of a data point belonging to a group according to the distance to the nearest point. The research chose 1 to 20 as the number of neighbors. The K Neighbors Classifier Scores were calculated for each number of neighbors. The line chart using the number of neighbors as x and the K Neighbors Classifier Scores as y was created. The research chose K equal 8 since it had the highest K Neighbors Classifier Score.</p></sec><sec id="s3_2_3"><title>3.2.3. Support Vector Machine</title><p>Support Vector Machine was chosen as one of the models because it is an algorithm for classification and regression. The research used svm from sklearn.svm package in Python. The Radial basis function kernel was selected, gamma equaled 0.01, and the ragularization parameter equaled 1 for the two machine learning models.</p></sec><sec id="s3_2_4"><title>3.2.4. Decision Tree</title><p>Decision tree was chosen because it is a nonparametric machine learning model for classification and regression. The research drew the line graph using the number of maximum depth from 1 to 30 as x and Decision Tree Classifier Score as y. Maximum depth equal to 10 was picked for the model building because it has the highest scores.</p></sec><sec id="s3_2_5"><title>3.2.5. Random Forest</title><p>Random Forest is an algorithm consisting of decision trees. Random Forest Classifier from the sklearn. ensumble package was used to build the home and all matrices models. The number of estimators equaled 1000 in both the home and all matrices models.</p></sec></sec></sec><sec id="s4"><title>4. Result</title><p>Raw data, after some preprocessing, are fed into machine learning algorithms. Afterward, the accuracy score and the false negative rate are obtained.</p><sec id="s4_1"><title>4.1. Accuracy</title><p>According to <xref ref-type="table" rid="table2">Table 2</xref>, the Logistic Regression and Support Vector Model have the highest accuracy score at 88.52% within the machine learning algorithms with all physical condition indicators. In comparison, the Decision Tree has the lowest accuracy score with only 85.25%. Within the machine learning algorithms with only physical condition indicators measured at home, Logistic Regression has the highest accuracy score at 73.77%, while the Support Vector Model has the lowest accuracy at only 68.85%.</p><p>After comparing the accuracy between machine learning algorithms with only physical condition indicators measured at home and algorithms with all physical condition indicators, it is concluded that algorithms with only physical condition indicators measured at home do not perform as accurately as algorithms with all physical condition indicators. The difference in accuracy ranges from 14.75% to 19.67%.</p></sec><sec id="s4_2"><title>4.2. False Negative Rate</title><p>From the false negative rate perspective (<xref ref-type="table" rid="table3">Table 3</xref>), it is observed that the Decision Tree has the highest false negative rate within the algorithms with all physical condition indicators. In contrast, Logistic Regression has the lowest false negative rate. Within the algorithms with only physical condition indicators measured at home, K Nearest Neighbors and Random Forest have the highest false negative rate, while Decision Tree has the lowest false negative rate.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> The table shows the accuracy score of machine learning algorithms with all physical condition indicators and only self-measurable indicators. Orange represents the algorithm with the highest accuracy score. Green represents the algorithm with the lowest accuracy score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Model</th><th align="center" valign="middle" >All Indicators Accuracy</th><th align="center" valign="middle" >Self-Measurable Indicators Accuracy</th></tr></thead><tr><td align="center" valign="middle" >Logistic Regression</td><td align="center" valign="middle" >88.52%</td><td align="center" valign="middle" >73.77%</td></tr><tr><td align="center" valign="middle" >K Nearest Neighbors</td><td align="center" valign="middle" >86.89%</td><td align="center" valign="middle" >70.49%</td></tr><tr><td align="center" valign="middle" >Support Vector Model</td><td align="center" valign="middle" >88.52%</td><td align="center" valign="middle" >68.85%</td></tr><tr><td align="center" valign="middle" >Decision Tree</td><td align="center" valign="middle" >85.25%</td><td align="center" valign="middle" >70.49%</td></tr><tr><td align="center" valign="middle" >Random Forest</td><td align="center" valign="middle" >86.89%</td><td align="center" valign="middle" >72.13%</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> The table shows the false negative rate of machine learning algorithms with all physical condition indicators and only self-measurable indicators. Orange represents the algorithm with the highest false negative rate. Green represents the algorithm with the lowest false negative rate</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Model</th><th align="center" valign="middle" >False Negative Rate—All Indicators</th><th align="center" valign="middle" >False Negative Rate— Self-Measurable Indicators</th></tr></thead><tr><td align="center" valign="middle" >Logistic Regression</td><td align="center" valign="middle" >8.82%</td><td align="center" valign="middle" >26.47%</td></tr><tr><td align="center" valign="middle" >K Nearest Neighbors</td><td align="center" valign="middle" >11.76%</td><td align="center" valign="middle" >29.41%</td></tr><tr><td align="center" valign="middle" >Support Vector Model</td><td align="center" valign="middle" >11.76%</td><td align="center" valign="middle" >23.53%</td></tr><tr><td align="center" valign="middle" >Decision Tree</td><td align="center" valign="middle" >17.65%</td><td align="center" valign="middle" >17.65%</td></tr><tr><td align="center" valign="middle" >Random Forest</td><td align="center" valign="middle" >11.76%</td><td align="center" valign="middle" >29.41%</td></tr></tbody></table></table-wrap><p>After comparison, it is concluded that machine learning algorithms with all physical condition indicators have a much lower false negative rate than algorithms with only physical condition indicators measured at home. Note that the false negative rate for the Decision Tree is the same for both groups. This is probably due to the randomness of the data splitting process, as the test set is only 20% of the entire data set, which is about 60 data samples. The difference between the algorithms ranges from 0% to 17.65%.</p></sec></sec><sec id="s5"><title>5. Conclusions and Discussion</title><sec id="s5_1"><title>5.1. Conclusion</title><p>To answer the research question of this study, it is concluded that the machine learning algorithms with only self-measurable physical condition indicators do not predict as accurately as machine learning algorithms with all physical condition indicators. Not only do algorithms with self-measurable physical condition indicators not predict the heart disease outcome as accurately as algorithms with all physical condition indicators, but they are also more likely to falsely predict not having heart disease among patients with heart disease. Thus, machine learning algorithms with only self-measurable physical condition indicators should not be used until more indicators are measurable at home in the future.</p></sec><sec id="s5_2"><title>5.2. Study Limitation</title><p>The findings of this study have to be seen in light of some limitations. It is noteworthy that the dataset used in this is a subset of the original database, which contained 76 attributes instead of 14, which is used in this study. Within the original 76 attributes, other attributes could be measured at home and thus improve the accuracy and reduce the false negative rate of the machine learning algorithms with only self-measurable physical condition indicators.</p></sec><sec id="s5_3"><title>5.3. Future Work</title><p>The limitations of this study have indicated the following areas as recommendations for future work. First, include other health attributes from the original dataset to discover the machine learning algorithm with the highest accuracy and lowest false negative rate. Second, since every patient has different health conditions, it is recommended to group the patients with similar health conditions and ages to investigate each machine learning algorithm’s accuracy and false negative rate.</p></sec></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Sun, H.T. and Pan, J.N. (2023) Heart Disease Prediction Using Machine Learning Algorithms with Self-Measurable Physical Condition Indicators. Journal of Data Analysis and Information Processing, 11, 1-10. https://doi.org/10.4236/jdaip.2023.111001</p></sec></body><back><ref-list><title>References</title><ref id="scirp.122494-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Heron, M. (2012) Deaths: Leading Causes for 2008. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 60, 1-94.</mixed-citation></ref><ref id="scirp.122494-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Makino, K., Lee, S., Bae, S., Chiba, I., Harada, K., Katayama, O., Shinkai, Y. and Shimada, H. (2021) Absolute Cardiovascular Disease Risk Assessed in Old Age Predicts Disability and Mortality: A Retrospective Cohort Study of Community-Dwelling Older Adults. Journal of the American Heart Association, 10, e022004. https://doi.org/10.1161/JAHA.121.022004</mixed-citation></ref><ref id="scirp.122494-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">WHO (2017) Cardiovascular Diseases. http://www.who.int/mediacentre/factsheets/fs317/en/</mixed-citation></ref><ref id="scirp.122494-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">ABS (2009) Causes of Death, Australia. Australian Bureau of Statistics. http://abs.gov.au/ausstats/abs@.nsf/Products/696C1CF9601E4D8DCA25788400127BF0?opendocument</mixed-citation></ref><ref id="scirp.122494-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">AHA (2017) American Heart Association. http://www.heart.org</mixed-citation></ref><ref id="scirp.122494-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Liu, X., Wang, X.L., Su, Q., Zhang, M., Zhu, Y.H., Wang, Q.G. and Wang, Q. (2017) A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method. Computational and Mathematical Methods in Medicine, 2017, 1-11. https://doi.org/10.1155/2017/8272091</mixed-citation></ref><ref id="scirp.122494-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Wilson, P.W.F., D’Agostino, R.B., Levy, D., Belanger, A.M., Silbershatz, H. and Kannel, W.B. (1998) Prediction of Coronary Heart Disease Using Risk Factor Categories. Circulation, 97, 1837-1847. https://doi.org/10.1161/01.CIR.97.18.1837</mixed-citation></ref><ref id="scirp.122494-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Liu, W., Tang, Q., Jin, J., et al. (2021) Sex Differences in Cardiovascular Risk Factors for Myocardial Infarction. Herz, 46, 115-122. https://doi.org/10.1007/s00059-020-04911-5</mixed-citation></ref><ref id="scirp.122494-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Lee, H.G., Noh, K.Y. and Ryu, K.H. (2007) Mining Biosignal Data: Coronary Artery Disease Diagnosis Using Linear and Nonlinear Features of HRV. In: Emerging Technologies in Knowledge Discovery and Data Mining, PAKDD 2007, Lecture Notes in Computer Science, Vol. 4819, Springer, Berlin, Heidelberg.</mixed-citation></ref><ref id="scirp.122494-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Nahar, J., Imam, T., Tickle, K.S. and Chen, Y.-P.P. (2013) Association Rule Mining to Detect Factors Which Contribute to Heart Disease in Males and Females. Expert Systems with Applications, 40, 1086-1093. https://doi.org/10.1016/j.eswa.2012.08.028</mixed-citation></ref><ref id="scirp.122494-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Desai, F., Chowdhury, D., Kaur, R., Peeters, M., Arya, R.C., Wander, G.S., Gill, S.S. and Buyya, R. (2022) HealthCloud: A System for Monitoring Health Status of Heart Patients Using Machine Learning and Cloud Computing. Internet of Things, 17, Article ID: 100485. https://doi.org/10.1016/j.iot.2021.100485</mixed-citation></ref><ref id="scirp.122494-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Nahar, J., Imam, T., Tickle, K.S. and Chen, Y.P.P. (2013) Computational Intelligence for Heart Disease Diagnosis: A Medical Knowledge Driven Approach. Expert Systems with Applications, 40, 96-104. https://doi.org/10.1016/j.eswa.2012.07.032</mixed-citation></ref><ref id="scirp.122494-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Xing, Y.W., Wang, J., Zhao, Z.H. and Gao, Y.H. (2007) Combination Data Mining Methods with New Medical Data to Predicting Outcome of Coronary Heart Disease. Convergence Information Technology, Gwangju, 21-23 November 2007, 868-872. https://doi.org/10.1109/ICCIT.2007.204</mixed-citation></ref><ref id="scirp.122494-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Akkaya, B., Sener, E. and Gursu, C. (2022) A Comparative Study of Heart Disease Prediction Using Machine Learning Techniques. 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, 9-11 June 2022, 1-8. https://doi.org/10.1109/HORA55278.2022.9799978</mixed-citation></ref></ref-list></back></article>