A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size
Abstract views: 210 / PDF downloads: 103
DOI:
https://doi.org/10.33200/ijcer.733720Keywords:
Logistic regression, CHAID analysis, Classification, Sample sizeAbstract
The aim of the study is to analyze how classification performances change in accordance with sample size in logistic regression and CHAID analyses. The dataset used in this study was obtained by means of “Attentional Control Scale.” The scale was applied to 1824 students and the analyses were done by randomly choosing the samples from the dataset. Nine classification criteria were determined in order to evaluate classification performances of logistic regression and CHAID analyses, and the results were interpreted in consideration of these criteria. As a result of the analyses, it was found that classification performance in logistic regression showed no change as sample size increased, and performed a better classification in small sample size (N= between 25 and 900) than CHAID analysis. On the other hand, in the method of CHAID analysis it was seen that classification performance improved as sample size increased, and provided stronger findings in large sample size (N= 1000 and above). Moreover, in classification studies logistic regression analysis yielded more reliable results, and CHAID analysis provided stronger classifications. The results of this study are considered to suggest researchers to select the methods in classification studies based on sample size.
References
Akın, A., Kaya, Ç., Uysal, R., Çardak, M., Çitemel, N., Özdemir, E., & Gülşen, M. (2013). Dikkat Kontrol Ölçeği Türkçe Formu: Geçerlik ve Güvenirlik Çalışması [The Turkish version of the attentional control scale:the validity and reliability study]. Paper presented at VI. National Graduate Education Symposium. Retrieved from http://www.academia.edu/download/43723223/Eitim_Modelinin_renci_zerindeki_Etkilili20160314-25744-1i99q7c.pdf#page=19
Akpınar, H. (2000). Veri tabanlarında bilgi keşfi ve veri madenciliği [Knowledge discovery and data mining in databases]. Istanbul Business Research, 29(1), 1-22. Retrieved from https://dergipark.org.tr/tr/pub/ibr/archive
Balcı, A. (2015). Sosyal bilimlerde araştırma yöntem, teknik ve ilkeler[Research methods, techniques and principles in social sciences]. Ankara: Pegem Akademi.
Berry M., & Linoff G., (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley & Sons.
Brewer S. L. (2012). An empirical comparison of logistic regression to decision tree induction in the prediction of intimate partner violence reassault. (Doctoral dissertation). Retrieved from https://www.proquest.com/
Bulut, N. (2015). İzleme amaçlı klinik araştırmalarda öngörülen ölçütlere göre örneklem büyüklüğünün belirlenmesi [Determination of sample size by criterias proposed on monitoring in clinical research]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Çakır, Ö. (2008). Veri madenciliğinde sınıflandırma yöntemlerinin karşılaştırılması “bankacılık müşteri veri tabanı üzerinde bir uygulama”[ Comparison of classification methods in data mining "an application on banking customer database"]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NJ: Erlbaum Hillsdale.
Deeks, J. J., & Altman, D. G. (2004). Diagnostic tests 4: likelihood ratios. Bmj, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168
Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statist. Med., 26, 3385–3397. https://doi.org/10.1002/sim.2771
Ekici, E. (2012). Farklı sınıflandırma yöntemlerinin karşılaştırılması ve bir uygulama[An application on the comparison of various classification methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Fajkowska, M. & Derryberry, D. (2010) . Psychometric properties of Attentional Control Scale: The preliminary study on a Polish sample. Polish Psychological Bulletin, 41(1), 1-7. https://doi.org/10.2478/s10059-010-0001-7
Finch, H., & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3(2), 47-57. https://doi.org/10.1027/1614-2241.3.2.47
Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with likelihood ratios. The Lancet, 365(9469), 1500-1505. https://doi.org/10.1016/S0140-6736(05)66422-7
Heckert, D.A., & Gondolf, E.W. (2005). Do multiple outcomes and conditional factors improve prediction of batterer reassault? Violence and Victims, 20 (1), 3-24. https://doi.org/10.1891/vivi.2005.20.1.3
Karakış, R., (2009). Yapay sinir ağları ve lojistik regresyon yöntemleri ile meme kanseri koltuk altı lenf nodu durumunun belirlenmesi[Prediction of the axillary lymph node status in breast cancer using artificial neural network and logistic regression analysis methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Kayri, M., & Boysan, M. (2007). Araştırmalarda CHAID analizinin kullanımı ve baş etme stratejileri ile ilgili bir uygulama[Using Chaid analysis in researches and an application pertaining to coping strategies]. Ankara University Journal of Faculty of Educational Sciences. 40(2), 133-149. https://doi.org/10.1501/Egifak_0000000146
King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. https://doi.org/10.1080/08839519508945477
Kıran, Z. B. (2010). Lojistik regresyon ve CART analizi teknikleriyle sosyal güvenlik kurumu ilaç provizyon sistemi verileri üzerinde bir uygulama[An application on pharmacy provision system data of social security institution by logistic regression and CART analysis technics]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/ Köktürk, F. (2012). K-en yakın komşuluk, yapay sinir ağları ve karar ağaçları yöntemlerinin sınıflandırma başarılarının karşılaştırılması[comparing classification success of k-nearest neighbor, artifical neural network and decision trees]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Koyuncu, M. S., (2015). Psikolojik ölçeklerde ROC analizi yöntemiyle standart belirleme[Standard determination in psychological scales using ROC analysis]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Kurt, İ. & Türe, M.(2005). Tıp öğrencilerinde alkol kullanımını etkileyen faktörlerin belirlenmesinde yapay sinir ağları ile lojistik regresyon analizi’nin karşılaştırılması[Comparison of artificial neural networks and logistic regression analysis in determining factors affecting alcohol consumption among medicine students]. The Balkan Medical Journal. 22(3), 142-153. Retrieved from https://dergipark.org.tr/en/pub/bmj/issue/3749/49838
Medcalc. (2018). Software manual. Retrieved from https://www.medcalc.org/download/medcalcmanual.pdf
Nemes, S., Jonasson, J.M., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 56(9), 1-5. https://doi.org/10.1186/1471-2288-9-56
Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15(2), 154-176. https://doi.org/10.1177/1088767911406867
Pehlivan, G. (2006). CHAID analizi ve bir uygulama[CHAID analysis and an application]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1-5).
Stafford, J.D., Kaminski, R.M. , Reinecke K.J., & Gerard, P.D., (2006). Multi-stage sampling for large scale natural resources surveys: a case study of rice and waterfowl. Journal of Environtmental Management, 78, 353-361. https://doi.org/10.1016/j.jenvman.2005.04.029
Tabachnick, B.G. & Fidell, L.S. (2013). Multivariate statistics. New Jersey: Pearson Education Inc.
Tan, Ş. (2016). SPSS ve excel uygulamalı temel istatistik-1[Basic statistics-1 with SPSS and excel application]. Ankara: Pegem Akademi. https://doi.org/10.14527/9786053183877
Zurada, J., & Lonial, S. (2005). Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. Journal of Applied Business Research, 21(2), 37-54. https://doi.org/10.19030/jabr.v21i2.1488
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Mehmet Şata, Fuat ELKONCA
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.