Yıl 2020, Cilt 7 , Sayı 2, Sayfalar 15 - 26 2020-12-30

A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size

Mehmet ŞATA [1] , Fuat ELKONCA [2]


The aim of the study is to analyze how classification performances change in accordance with sample size in logistic regression and CHAID analyses. The dataset used in this study was obtained by means of “Attentional Control Scale.” The scale was applied to 1824 students and the analyses were done by randomly choosing the samples from the dataset. Nine classification criteria were determined in order to evaluate classification performances of logistic regression and CHAID analyses, and the results were interpreted in consideration of these criteria. As a result of the analyses, it was found that classification performance in logistic regression showed no change as sample size increased, and performed a better classification in small sample size (N= between 25 and 900) than CHAID analysis. On the other hand, in the method of CHAID analysis it was seen that classification performance improved as sample size increased, and provided stronger findings in large sample size (N= 1000 and above). Moreover, in classification studies logistic regression analysis yielded more reliable results, and CHAID analysis provided stronger classifications. The results of this study are considered to suggest researchers to select the methods in classification studies based on sample size.

Logistic regression, CHAID analysis, Classification, Sample size
  • Akın, A., Kaya, Ç., Uysal, R., Çardak, M., Çitemel, N., Özdemir, E., & Gülşen, M. (2013). Dikkat Kontrol Ölçeği Türkçe Formu: Geçerlik ve Güvenirlik Çalışması [The Turkish version of the attentional control scale:the validity and reliability study]. Paper presented at VI. National Graduate Education Symposium. Retrieved from http://www.academia.edu/download/43723223/Eitim_Modelinin_renci_zerindeki_Etkilili20160314-25744-1i99q7c.pdf#page=19
  • Akpınar, H. (2000). Veri tabanlarında bilgi keşfi ve veri madenciliği [Knowledge discovery and data mining in databases]. Istanbul Business Research, 29(1), 1-22. Retrieved from https://dergipark.org.tr/tr/pub/ibr/archive
  • Balcı, A. (2015). Sosyal bilimlerde araştırma yöntem, teknik ve ilkeler[Research methods, techniques and principles in social sciences]. Ankara: Pegem Akademi.
  • Berry M., & Linoff G., (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley & Sons.
  • Brewer S. L. (2012). An empirical comparison of logistic regression to decision tree induction in the prediction of intimate partner violence reassault. (Doctoral dissertation). Retrieved from https://www.proquest.com/
  • Bulut, N. (2015). İzleme amaçlı klinik araştırmalarda öngörülen ölçütlere göre örneklem büyüklüğünün belirlenmesi [Determination of sample size by criterias proposed on monitoring in clinical research]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Çakır, Ö. (2008). Veri madenciliğinde sınıflandırma yöntemlerinin karşılaştırılması “bankacılık müşteri veri tabanı üzerinde bir uygulama”[ Comparison of classification methods in data mining "an application on banking customer database"]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NJ: Erlbaum Hillsdale.
  • Deeks, J. J., & Altman, D. G. (2004). Diagnostic tests 4: likelihood ratios. Bmj, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168
  • Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statist. Med., 26, 3385–3397. https://doi.org/10.1002/sim.2771
  • Ekici, E. (2012). Farklı sınıflandırma yöntemlerinin karşılaştırılması ve bir uygulama[An application on the comparison of various classification methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Fajkowska, M. & Derryberry, D. (2010) . Psychometric properties of Attentional Control Scale: The preliminary study on a Polish sample. Polish Psychological Bulletin, 41(1), 1-7. https://doi.org/10.2478/s10059-010-0001-7
  • Finch, H., & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3(2), 47-57. https://doi.org/10.1027/1614-2241.3.2.47
  • Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with likelihood ratios. The Lancet, 365(9469), 1500-1505. https://doi.org/10.1016/S0140-6736(05)66422-7
  • Heckert, D.A., & Gondolf, E.W. (2005). Do multiple outcomes and conditional factors improve prediction of batterer reassault? Violence and Victims, 20 (1), 3-24. https://doi.org/10.1891/vivi.2005.20.1.3
  • Karakış, R., (2009). Yapay sinir ağları ve lojistik regresyon yöntemleri ile meme kanseri koltuk altı lenf nodu durumunun belirlenmesi[Prediction of the axillary lymph node status in breast cancer using artificial neural network and logistic regression analysis methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kayri, M., & Boysan, M. (2007). Araştırmalarda CHAID analizinin kullanımı ve baş etme stratejileri ile ilgili bir uygulama[Using Chaid analysis in researches and an application pertaining to coping strategies]. Ankara University Journal of Faculty of Educational Sciences. 40(2), 133-149. https://doi.org/10.1501/Egifak_0000000146
  • King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. https://doi.org/10.1080/08839519508945477
  • Kıran, Z. B. (2010). Lojistik regresyon ve CART analizi teknikleriyle sosyal güvenlik kurumu ilaç provizyon sistemi verileri üzerinde bir uygulama[An application on pharmacy provision system data of social security institution by logistic regression and CART analysis technics]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/ Köktürk, F. (2012). K-en yakın komşuluk, yapay sinir ağları ve karar ağaçları yöntemlerinin sınıflandırma başarılarının karşılaştırılması[comparing classification success of k-nearest neighbor, artifical neural network and decision trees]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Koyuncu, M. S., (2015). Psikolojik ölçeklerde ROC analizi yöntemiyle standart belirleme[Standard determination in psychological scales using ROC analysis]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kurt, İ. & Türe, M.(2005). Tıp öğrencilerinde alkol kullanımını etkileyen faktörlerin belirlenmesinde yapay sinir ağları ile lojistik regresyon analizi’nin karşılaştırılması[Comparison of artificial neural networks and logistic regression analysis in determining factors affecting alcohol consumption among medicine students]. The Balkan Medical Journal. 22(3), 142-153. Retrieved from https://dergipark.org.tr/en/pub/bmj/issue/3749/49838
  • Medcalc. (2018). Software manual. Retrieved from https://www.medcalc.org/download/medcalcmanual.pdf
  • Nemes, S., Jonasson, J.M., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 56(9), 1-5. https://doi.org/10.1186/1471-2288-9-56
  • Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15(2), 154-176. https://doi.org/10.1177/1088767911406867
  • Pehlivan, G. (2006). CHAID analizi ve bir uygulama[CHAID analysis and an application]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1-5).
  • Stafford, J.D., Kaminski, R.M. , Reinecke K.J., & Gerard, P.D., (2006). Multi-stage sampling for large scale natural resources surveys: a case study of rice and waterfowl. Journal of Environtmental Management, 78, 353-361. https://doi.org/10.1016/j.jenvman.2005.04.029
  • Tabachnick, B.G. & Fidell, L.S. (2013). Multivariate statistics. New Jersey: Pearson Education Inc.
  • Tan, Ş. (2016). SPSS ve excel uygulamalı temel istatistik-1[Basic statistics-1 with SPSS and excel application]. Ankara: Pegem Akademi. https://doi.org/10.14527/9786053183877
  • Zurada, J., & Lonial, S. (2005). Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. Journal of Applied Business Research, 21(2), 37-54. https://doi.org/10.19030/jabr.v21i2.1488
Birincil Dil en
Konular Sosyal
Bölüm Articles
Yazarlar

Orcid: 0000-0003-2683-4997
Yazar: Mehmet ŞATA (Sorumlu Yazar)
Kurum: Agri Ibrahim Cecen University
Ülke: Turkey


Orcid: 0000-0002-2733-8891
Yazar: Fuat ELKONCA
Kurum: MUS ALPARSLAN UNIVERSITY
Ülke: Turkey


Tarihler

Yayımlanma Tarihi : 30 Aralık 2020

Bibtex @araştırma makalesi { ijcer733720, journal = {International Journal of Contemporary Educational Research}, issn = {}, eissn = {2148-3868}, address = {}, publisher = {Mustafa AYDIN}, year = {2020}, volume = {7}, pages = {15 - 26}, doi = {10.33200/ijcer.733720}, title = {A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size}, key = {cite}, author = {Şata, Mehmet and Elkonca, Fuat} }
APA Şata, M , Elkonca, F . (2020). A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size . International Journal of Contemporary Educational Research , 7 (2) , 15-26 . DOI: 10.33200/ijcer.733720
MLA Şata, M , Elkonca, F . "A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size" . International Journal of Contemporary Educational Research 7 (2020 ): 15-26 <http://ijcer.net/tr/pub/issue/58098/733720>
Chicago Şata, M , Elkonca, F . "A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size". International Journal of Contemporary Educational Research 7 (2020 ): 15-26
RIS TY - JOUR T1 - A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size AU - Mehmet Şata , Fuat Elkonca Y1 - 2020 PY - 2020 N1 - doi: 10.33200/ijcer.733720 DO - 10.33200/ijcer.733720 T2 - International Journal of Contemporary Educational Research JF - Journal JO - JOR SP - 15 EP - 26 VL - 7 IS - 2 SN - -2148-3868 M3 - doi: 10.33200/ijcer.733720 UR - https://doi.org/10.33200/ijcer.733720 Y2 - 2020 ER -
EndNote %0 International Journal of Contemporary Educational Research A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size %A Mehmet Şata , Fuat Elkonca %T A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size %D 2020 %J International Journal of Contemporary Educational Research %P -2148-3868 %V 7 %N 2 %R doi: 10.33200/ijcer.733720 %U 10.33200/ijcer.733720
ISNAD Şata, Mehmet , Elkonca, Fuat . "A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size". International Journal of Contemporary Educational Research 7 / 2 (Aralık 2020): 15-26 . https://doi.org/10.33200/ijcer.733720
AMA Şata M , Elkonca F . A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size. International Journal of Contemporary Educational Research. 2020; 7(2): 15-26.
Vancouver Şata M , Elkonca F . A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size. International Journal of Contemporary Educational Research. 2020; 7(2): 15-26.
IEEE M. Şata ve F. Elkonca , "A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size", International Journal of Contemporary Educational Research, c. 7, sayı. 2, ss. 15-26, Ara. 2021, doi:10.33200/ijcer.733720