A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size


Abstract views: 210 / PDF downloads: 103

Authors

  • Mehmet Şata
  • Fuat ELKONCA

DOI:

https://doi.org/10.33200/ijcer.733720

Keywords:

Logistic regression, CHAID analysis, Classification, Sample size

Abstract

The aim of the study is to analyze how classification performances change in accordance with sample size in logistic regression and CHAID analyses. The dataset used in this study was obtained by means of “Attentional Control Scale.” The scale was applied to 1824 students and the analyses were done by randomly choosing the samples from the dataset. Nine classification criteria were determined in order to evaluate classification performances of logistic regression and CHAID analyses, and the results were interpreted in consideration of these criteria. As a result of the analyses, it was found that classification performance in logistic regression showed no change as sample size increased, and performed a better classification in small sample size (N= between 25 and 900) than CHAID analysis. On the other hand, in the method of CHAID analysis it was seen that classification performance improved as sample size increased, and provided stronger findings in large sample size (N= 1000 and above). Moreover, in classification studies logistic regression analysis yielded more reliable results, and CHAID analysis provided stronger classifications. The results of this study are considered to suggest researchers to select the methods in classification studies based on sample size.

Author Biographies

Mehmet Şata

This study was presented as an abstract proceeding at the 26th International Conference on Educational Sciences held between 20-23 April 2017.

Corresponding Author: Mehmet Şata, mehmetsata@gmail.com; msata@agri.edu.tr, Agri Ibrahim Cecen University
0000-0003-2683-4997
Türkiye

Fuat ELKONCA

Fuat ELKONCA
MUS ALPARSLAN UNIVERSITY
0000-0002-2733-8891
Türkiye

 

References

Akın, A., Kaya, Ç., Uysal, R., Çardak, M., Çitemel, N., Özdemir, E., & Gülşen, M. (2013). Dikkat Kontrol Ölçeği Türkçe Formu: Geçerlik ve Güvenirlik Çalışması [The Turkish version of the attentional control scale:the validity and reliability study]. Paper presented at VI. National Graduate Education Symposium. Retrieved from http://www.academia.edu/download/43723223/Eitim_Modelinin_renci_zerindeki_Etkilili20160314-25744-1i99q7c.pdf#page=19

Akpınar, H. (2000). Veri tabanlarında bilgi keşfi ve veri madenciliği [Knowledge discovery and data mining in databases]. Istanbul Business Research, 29(1), 1-22. Retrieved from https://dergipark.org.tr/tr/pub/ibr/archive

Balcı, A. (2015). Sosyal bilimlerde araştırma yöntem, teknik ve ilkeler[Research methods, techniques and principles in social sciences]. Ankara: Pegem Akademi.

Berry M., & Linoff G., (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley & Sons.

Brewer S. L. (2012). An empirical comparison of logistic regression to decision tree induction in the prediction of intimate partner violence reassault. (Doctoral dissertation). Retrieved from https://www.proquest.com/

Bulut, N. (2015). İzleme amaçlı klinik araştırmalarda öngörülen ölçütlere göre örneklem büyüklüğünün belirlenmesi [Determination of sample size by criterias proposed on monitoring in clinical research]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Çakır, Ö. (2008). Veri madenciliğinde sınıflandırma yöntemlerinin karşılaştırılması “bankacılık müşteri veri tabanı üzerinde bir uygulama”[ Comparison of classification methods in data mining "an application on banking customer database"]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NJ: Erlbaum Hillsdale.

Deeks, J. J., & Altman, D. G. (2004). Diagnostic tests 4: likelihood ratios. Bmj, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168

Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statist. Med., 26, 3385–3397. https://doi.org/10.1002/sim.2771

Ekici, E. (2012). Farklı sınıflandırma yöntemlerinin karşılaştırılması ve bir uygulama[An application on the comparison of various classification methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Fajkowska, M. & Derryberry, D. (2010) . Psychometric properties of Attentional Control Scale: The preliminary study on a Polish sample. Polish Psychological Bulletin, 41(1), 1-7. https://doi.org/10.2478/s10059-010-0001-7

Finch, H., & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3(2), 47-57. https://doi.org/10.1027/1614-2241.3.2.47

Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with likelihood ratios. The Lancet, 365(9469), 1500-1505. https://doi.org/10.1016/S0140-6736(05)66422-7

Heckert, D.A., & Gondolf, E.W. (2005). Do multiple outcomes and conditional factors improve prediction of batterer reassault? Violence and Victims, 20 (1), 3-24. https://doi.org/10.1891/vivi.2005.20.1.3

Karakış, R., (2009). Yapay sinir ağları ve lojistik regresyon yöntemleri ile meme kanseri koltuk altı lenf nodu durumunun belirlenmesi[Prediction of the axillary lymph node status in breast cancer using artificial neural network and logistic regression analysis methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Kayri, M., & Boysan, M. (2007). Araştırmalarda CHAID analizinin kullanımı ve baş etme stratejileri ile ilgili bir uygulama[Using Chaid analysis in researches and an application pertaining to coping strategies]. Ankara University Journal of Faculty of Educational Sciences. 40(2), 133-149. https://doi.org/10.1501/Egifak_0000000146

King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. https://doi.org/10.1080/08839519508945477

Kıran, Z. B. (2010). Lojistik regresyon ve CART analizi teknikleriyle sosyal güvenlik kurumu ilaç provizyon sistemi verileri üzerinde bir uygulama[An application on pharmacy provision system data of social security institution by logistic regression and CART analysis technics]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/ Köktürk, F. (2012). K-en yakın komşuluk, yapay sinir ağları ve karar ağaçları yöntemlerinin sınıflandırma başarılarının karşılaştırılması[comparing classification success of k-nearest neighbor, artifical neural network and decision trees]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Koyuncu, M. S., (2015). Psikolojik ölçeklerde ROC analizi yöntemiyle standart belirleme[Standard determination in psychological scales using ROC analysis]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Kurt, İ. & Türe, M.(2005). Tıp öğrencilerinde alkol kullanımını etkileyen faktörlerin belirlenmesinde yapay sinir ağları ile lojistik regresyon analizi’nin karşılaştırılması[Comparison of artificial neural networks and logistic regression analysis in determining factors affecting alcohol consumption among medicine students]. The Balkan Medical Journal. 22(3), 142-153. Retrieved from https://dergipark.org.tr/en/pub/bmj/issue/3749/49838

Medcalc. (2018). Software manual. Retrieved from https://www.medcalc.org/download/medcalcmanual.pdf

Nemes, S., Jonasson, J.M., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 56(9), 1-5. https://doi.org/10.1186/1471-2288-9-56

Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15(2), 154-176. https://doi.org/10.1177/1088767911406867

Pehlivan, G. (2006). CHAID analizi ve bir uygulama[CHAID analysis and an application]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/

Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1-5).

Stafford, J.D., Kaminski, R.M. , Reinecke K.J., & Gerard, P.D., (2006). Multi-stage sampling for large scale natural resources surveys: a case study of rice and waterfowl. Journal of Environtmental Management, 78, 353-361. https://doi.org/10.1016/j.jenvman.2005.04.029

Tabachnick, B.G. & Fidell, L.S. (2013). Multivariate statistics. New Jersey: Pearson Education Inc.

Tan, Ş. (2016). SPSS ve excel uygulamalı temel istatistik-1[Basic statistics-1 with SPSS and excel application]. Ankara: Pegem Akademi. https://doi.org/10.14527/9786053183877

Zurada, J., & Lonial, S. (2005). Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. Journal of Applied Business Research, 21(2), 37-54. https://doi.org/10.19030/jabr.v21i2.1488

Downloads

Published

30.10.2022

How to Cite

Şata, M., & ELKONCA, F. (2022). A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size. International Journal of Contemporary Educational Research, 7(2), 15–26. https://doi.org/10.33200/ijcer.733720

Issue

Section

Articles