Reviewing the Factors Affecting PISA Reading Skills by Using Random Forest and MARS Methods

Abstract views: 141 / PDF downloads: 78




Educational data mining, MARS, PISA, Random forest, Reading skills


The research aims to determine the factors affecting PISA 2018 reading skills using the Random Forest and MARS methods and to compare their prediction abilities. This study used the information from 5713 students, 2838 (49.7%) male and 2875 (50.3%) female, in the PISA 2018 Turkey. The analysis shows the MARS method performed better than the Random Forest method. In both methods, the most significant factor affecting reading skills in Turkey is “the number of books in the house.” The variables the MARS method finds significant are “students' perception of difficulty, motivation for reading skills, father’s educational status, reading pleasure, bullying experience of the student, mother's educational status, attitude towards school, classical artifacts at home, supplementary school books at home, competition at school, competitive power, cooperation perception at school, reading frequency, self-efficacy, poetry books at home, anxiety about reading skills, and teacher support.” However, the other variables had no relationship to prediction. This study is expected to serve as a model for the use of data mining in educational research. 


Acee, T.W., Kim, H., Kim, H. J., Kim, J. I., Chu, H. N. R., Kim, M., Cho, Y., Wicker, F.H. & The Boredom Research Group. (2010). Academic boredom in under- and over-challenging situations. Contemporary Educational Psychology, 35 (1), 17-27. Retrieved from

Açıkgöz, T. (2017). Bullying and attitude towards secondary school students: Sample of Kartepe district (Unpublished master’s thesis). Sakarya University.

Akman, M., Genç, Y. & Ankaralı, H. (2011). Random Forests Methods and an Application in Health Science. Turkiye Klinikleri J Biostat, 3(1):36-48.

Aksu G. & Güzeller C. O. (2016). Classification of PISA 2012 Mathematical Literacy Scores Using Decision-Tree Method: Turkey Sampling. Educatıon and Scıence, 41(185),101-122. Retrieved from

Aksu, G. & Doğan, N. (2018). Comparison of Learning Methods Used in Data Mining Under Different Conditions. Ankara University Journal of Faculty of Educational Sciences, 51(3), 71-100. Retrieved from

Anıl, D. (2009). Factors Effecting Science Achievement of Science Students in Programme for International Students’ Achievement (PISA) in Turkey. Education and Science, 34(152), 87-100. Retrieved from

Arabameri, A., Pradhan, B., Pourghasemi, H. R., Rezaei, K. & Kerle, N. (2018). Spatial modelling of gully erosion using GIS and R programing: A comparison among three data mining algorithms. Applied sciences, 8(8), 1369. Retrieved from

Arıcı, Ö. & Altıntaş, Ö. (2014). An Investigation of the PISA 2009 Reading Literacy in Terms of Socio-Economical Backgrounds and Receiving Pre-School Education “Turkey Example”. Ankara University, Journal of Faculty of Educational Sciences, 47(1), 423-448.

Bayraktar, V.H. (2015). Student motivation in classroom management and factors that affect motivation. Turkish Studies, 10(3), 1079-1100. Retrieved from

Behr, A. Giese, M. Teguim Kamdjou, H.D. & Theune, K. (2020). Dropping out of university: a literature review. Review of Education. 8(2), 614-652. Retrieved from

Biau, G. & Scornet, E. (2016). A random forest guided tour. An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN 1133-0686 25(2), 197–-227. DOI doi:10.1007/s11749-016-0481-7.

Biau, G. (2012). Analysis of a random forest. Journal of Machine Learning Research, 13(2012), 1063-1095.

Bing, M. N. (1999 ). Hypercompetitiveness in academia: Achieving criterion-related validity from item context specificity. Journal of Personality Assessment, 73(1), 80 – 99. Retrieved from

Bozkurt, B. Ü. (2016). A report on reading instruction in Turkey: implications from PISA scale. Abant Journal of İzzet Baysal University Faculty of Education, 16 (4), 1673-1686.

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–-32.

Chang, Y. C. & Bangsri, A. (2020). Thai Students’ Perceived Teacher Support on Their Reading Ability: Mediating Effects of Self-Efficacy and Sense of School Belonging. International Journal of Educational Methodology, 6(2), 435 - 446.

Chen, W., Pourghasemi, H. R. & Naghibi, S. A. (2018). Prioritization of landslide conditioning factors and its spatial modeling in Shangnan County, China using GIS-based data mining algorithms. Bulletin of Engineering Geology and the Environment, 77(2), 611-629. Retrieved from

Chiu, C. C., Wu, C. M., Chien, T. N., Kao, L. J. & Qiu, J. T. (2022, June). Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques. In Healthcare, 10(6),1087. Multidisciplinary Digital Publishing Institute.Retrieved from

Chiu, M.M. & Mc-Bride Chang, C. (2009). Gender, Context, and Reading: A Comparison of Students in 43 Countries. Scientific Studies of Reading, 10(4), 331–362. Retrieved from

Cutler, A., Cutler, D.R. & Stevens, J.R. (2011). Random Forests. Ensemble Machine Learning pp 157-175

Cutler, D.R., T.C. Edwards, K.H. Beard, A. Cutler, K.T., Hess, J.C. Gibson & J.J. Lawler., (2007). Random forests for classification in ecology. Ecology, 88 (11), 783-2792. Retrieved from

Deichmann, J.,Eshghi, A., Haughton, D., Sayek, S. & Teebagy, N. (2002). Application of multiple adaptive regression splines (MARS) in direct response modeling. Journal of Interactive Marketing, 16(4), 15-27. Retrieved from

Efklides, A. & Petkaki, C. (2005). Effects of mood on students' metacognitive experiences. Learning and Instruction, 15(5), 415-431. Retrieved from

Erdoğan, E. & Acar Güvendir, M. (2019). The Relationship Between Students Socioeconomic Attributes and Their Reading Skills in Programme for International Student Assessment. Eskişehir Osmangazi University Journal of Social Sciences, 20 (Özel Sayı),1-31 Retrieved from

Friedman, J. (1991). Invited paper multivariate adaptive regression splines. TheAnnals of Statistics, 19(1), 1-141.

Frymier, A. B. & Houser, M. L. (2000). The teacher-student relationship as an interpersonal relationship. Communication Education, 49(3), 207 – 219. Retrieved from

Fulmer, S.M. & Tulis, M. (2013). Changes in interest and affect during a difficult reading task: Relationships with perceived difficulty and reading fluency. Learning and Instruction, 27(2013),11-20. Retrieved from

Gamazo, A. & Martínez-Abad, F. (2020). An exploration of factors linked to academic performance in PISA 2018 through data mining techniques. Frontiers in Psychology, 11, 575167. Retrieved from

Genuer, R., Poggi, J. M., Tuleau-Malot, C. & Villa-Vialaneix, N. (2017). Random forests for big data. Big Data Research, 9, 28-46. Retrieved from

Golkarian, A., Naghibi, S. A., Kalantar, B. & Pradhan, B. (2018). Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environmental monitoring and assessment, 190(3), 1-16. Retrieved from

Güleç, S. & Alkış, S. (2003). Relations among Primary School Students’ Course Performances. Elementary Education Online , 2(2),19-27.

Gündüver, A. & Gökdaş, İ. (2011). Exploring 8th Grade Placement Test Achievement of Elementary School Children According to Certain Variables. Adnan Menderes University Faculty of Education Journal of Educational Sciences, 2(2),30-47.

Güre, Ö. B., Kayri, M., & Erdoğan, F. (2020). Analysis of Factors Effecting PISA 2015 Mathematics Literacy via Educational Data Mining. Education & Science/ Egitim ve Bilim, 45(202). Retrieved from

Gürsakal, S. (2009). An evaluation of PISA 2009 student achievement levels’ affecting factors. Suleyman Demirel University The Journal of Faculty of Economics and Administrative Sciences, 17(1), 441-452.

Han,Z., He, Q. & von Davier, M. (2019). Predictive Feature Generation and Selection Using Process Data From PISA Interactive Problem-Solving Items: An Application of Random Forests. Frontiers in Psychology, 10: 2461. doi: 10.3389/fpsyg.2019.02461

Heikkinen, R. K., Marmion, M, & Luoto, M. (2012). Does the interpolation accuracy of species distribution models come at the expense of transferability? Ecography, 35(3), 276-288. Retrieved from

Ikhsanza, C. S., Vianty, M. & Rosmalina, I. (2019, January). Reading Literacy Performances of State Senior High School Students in Ilir Barat I District as Measured by PISA Reading Literacy Test 2009 in English and Bahasa Indonesia. In International Seminar and Annual Meeting BKS-PTN Wilayah Barat (Vol. 1, No. 1).

Işık, N. (2016). The effect of mathematical modelling activities on difficulty perception and success of numbers domain in primary school 4th class. (Unpublished doctoral dissertation). Necmettin Erbakan University.

İnal, H. & Turabik, T. (2017). Determination of predictive power of some factors affecting mathematics achievement via artificial neural networks. Uşak University Journal of Educational Research, 3(1), 23-50. doi:10.29065/usakead.287754

Kahraman, Ü. & Çelik, K. (2017). Analysis of PISA 2012 results in terms of some variables. Journal of Human Sciences, 14(4), 4797-4808. doi:10.14687/jhs.v14i4.5136

Karabay, E., Yıldırım, A. & Güler, G. (2015). The Analysis of the Relationship of PISA Maths Literacy with Student and School Characteristics by Years with Hierarchical Linear Models. Journal of Mehmet Akif Ersoy University Faculty of Education, 36, 137-151. Retrieved from

Karasar, N. (2006). Scientific research method. Ankara: Nobel Publication Distribution.

Karatekin, K., Sönmez, Ö. F. & Kuş, Z. (2012). Investigation of primary school students "communication skills according to several variables. International Periodical For The Languages, Literature and History of Turkish or Turkic, 7(3), 1695-1708.

Kaya, V. H. (2017). In the Program for International Student Assessment (PISA), reading skills. Journal of National Education, 215, 193-207.

Kayri, M. (2009). The effectiveness of the multivariate adaptive regression splines method in unbiased and unbiased measurement processes: An application example. XVIII. National Educational Sciences Congress, 123-132.

Kayri, M. (2010). The analysis of internet addiction scale using multivariate adaptive regression splines. Iranian journal of public health, 39(4), 51.

Keller, P. S., El-Sheikh, M., Granger, D. A. & Buckhalt, J. A. (2012). Interactions between salivary cortisol and alphaamylase as predictors of children’s cognitive functioning and academic performance. Physiology & Behavior, 105, 987-995. Retrieved from

Kılıç Depren, S. (2018). Prediction Of Students’ Science Achievement: An Application Of Multivariate Adaptive Regression Splines And Regression Trees. Journal of Baltic Science Education, 17(5), 887-903. DOI: 10.33225/jbse/18.17.887

Kundu, M., Nashiry, M. A., Dipongkor, A. K., Sumi, S. S. & Hossain, M. A. (2021). An optimized machine learning approach for predicting Parkinson’s disease. Int. J. Mod. Educ. Comput. Sci. (IJMECS), 13(4), 68-74. DOI: 10.5815/ijmecs.2021.04.06

Kurnaz, H. & Yıldız, N. (2015). Assessment of the different variables of secondary school students’ reading motivation. Turkish Journal of Social Research, 19(3), 53-70.

Kurulgan, M. & Çekerol, G. S. (2008). A study on reading and using the library habits of students. Anadolu University Journal of Social Sciences, 8(2).

Kuter, S.,Weber, G.-W. & Karasözen, B. (2015). Current Applications of Non-Parametric Regression Curves. Academic Informatics 2015 Conference, 4-6, February 2015. Eskişehir, Turkey.

Kutlu, Ö., Yıldırım, O., Bilican, S. & Kumandaş, H. (2011). An Investigation of the Variables Effective in Predicting the Success or Failure of Primary Education 5th Grade Students in Reading Comprehension. Journal of Measurement and Evaluation in Education and Psychology, 2(1), 1309-6575.

Lawrence, R. L. & Moran, C. J. (2015). The AmericaView classification methods accuracy comparison project: A rigorous approach for model selection. Remote Sensing of Environment, 170, 115-120. Retrieved from

Liaw, A. & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18-22.

Lindner C. L. (2011), Predictive Modeling in Adult Education, Major in Education in the College of Graduate Studies. (Unpublished doctoral dissertation). University of Idaho.

Mahboob, T., Sadaf, I. & Karamat, A. (2016). A machine learning approach for student assessment in E-learning using Quinlan's C4.5, Naive Bayes and Random Forest algorithms. 19th International Multi-Topic Conference (INMIC); 5-6 Dec. 2016 (s:1-8). Islamabad, Pakistan.

Maier, S. R. & Curtin, P.A. (2005). Self-Efficacy Theory: A Prescriptive Model for Teaching Research Methods. Journalism and Mass Communication Educator, 59(4), 352-364.

Martínez-Abad, F., Gamazo, A. & Rodríguez-Conde, M. J. (2020). Educational Data Mining: Identification of factors associated with school effectiveness in PISA assessment. Studies in Educational Evaluation, 66, 100875. Retrieved from

McQuillan, J. & Au, J. (2011). The effect of print access on reading frequency. Journal Reading Psychology, 22(3), 225-248.

Mendi, H. B. (2009).The relationship between reading strategies, motivation and reading test performance in foreign language learning. (Unpublished master’s thesis). Marmara University.

Ministry of National Education-MoNE (2019). PISA 2009 project national preliminary report. Ankara: MEB Education Research and Development Department. Retrieved from

Munkhdalai, L., Munkhdalai, T., Namsrai, O. E., Lee, J. Y. & Ryu, K. H. (2019). An empirical comparison of machine-learning methods on bank client credit assessments. Sustainability, 11(3), 699. Retrieved from

Nisbet R., Elder J. & Miner G. (2009). Handbook of Statıstıcal Analysis and Data Mining Applications. Elsevier Academic Press. (123, 138-139, 158-162). Canada.

OECD (2019), PISA 2018 Assessment and Analytical Framework, PISA, OECD Publishing, Paris, Retrieved from

Okur., A. & Arı, G. (2013). State of students reading 100 basic literary works. The Journal of Turkish Social Research, 173(173), 307-328.

Østergård, T., Jensen, R. L. & Maagaard, S. E. (2018). A comparison of six metamodeling techniques applied to building performance simulations. Applied Energy, 211, 89-103. Retrieved from

Pekrun, R. Goetz, T. Titz, W. & Perry, R.P. (2002). Academic emotions in students' self-regulated learning and achievement: a program of qualitative and quantitative research. Educational Psychologist, 37 (2002),91-105. Retrieved from

Pelaez, K., Guarcello, M., Fan, J., Levine, A. R. & Laumakis, M., (2019). Using a Latent Class Forest to Identify At-Risk Students in Higher Education. Journal of Educational Data Mining, 11(1), 18–46. Retrieved from

Petkoviç, D., Sosnick-Pérez, M., Okada, K., Todtenhoefer, R., Huang, S., Miglani, N. & Vigil, A. (2016). Frontiers in Education (FIE) Conference; 12-15 October (s:1-7). Eire, PA, USA

Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo California. Morgan Kaufmann Publishers.

Saarela, M., Yener, B., Zaki, M. J. & Kärkkäinen, T. (2016). Predicting math performance from raw large-scale educational assessments data: a machine learning approach. In JMLR Workshop and Conference Proceedings; 48. JMLR. Retrieved from

Savaş, E., Taş. S. & Duru, A. (2010). Factors affecting students’ achievement in mathematics. Inonu University Journal of The Faculty of Education. Inonu unıversity journal of the faculty of education, 11(1), 113-132. Retrieved from

Shimotsu - Dariol , S., Mansson, D. H. & Myers, S.A. (2012). Students' Academic Competitiveness and Their Involvement in the Learning Process. Communication Research Reports, 29(4),310-319. Retrieved from

Shirzad, A. & Safari, M. J. S. (2019). Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques. Urban Water Journal, 16(9), 653-661. Retrieved from

Statsoft, (2017). Multivariate Adaptive Regression Splines (marsplines) Retrieved June 10, 2017 from Retrieved from

Şevgin, H. & Önen, E. (2022). Comparison of Classification Performances of MARS and BRT Data Mining Methods: ABİDE- 2016 Case. Education and Science, 47(211). doi: Retrieved from

Şevgin, H. (2020). Predicting the ABIDE 2016 science achievement: the comparison of MARS and BRT data mining methods (Unpublished doctoral dissertation). Gazi University.

Taş, H. & Minaz, M. B. (2018). Evaluation of the Use of Supplementary Resources in Lessons According to the Opinions of Teachers, Parents and Students. 2nd International Symposium on Innovative Approaches in Scientific Studies,. 30 November 02 December (s: 582-589). Samsun, Turkey

Tercanlıoğlu, L. (2001). The nature of Turkish students’ motivation for reading and its relation to their reading frequency. The Reading Matrix, 1(2),1-33.

Torney-Purta, J. & Amadeo, J. A. (2013). International large-scale assessments: Challenges in reporting and potentials for secondary analysis. Research in Comparative and International Education, 8(3), 248-258. Retrieved from

Türkan, A., Üner, S.S. & Alcı, B. (2015). An Analysis of 2012 PISA Mathematics Test Scores in Terms of Some Variables. Ege Journal of Education, (16) (2):, 358-372. Doidoi:10.12984/eed.68351.

Urfalı Dadandı, P. Dadandı, İ. & Koca, F. (2018). The Relationships Between Socieconomic Factors And Reading Literacy According To Pisa 2015 Turkey Results. International Journal of Turkish Literature, Culture and Education, 7(2), 1239-1252.

Uzun, N. & Keleş, Ö. (2010). Comparison of Pre Service Science Teachers Creativity Who are in Different Instruction Processes According to Gender and Type of Graduated High School. Journal of Gazi Education Faculty, 30(2), 1-16.

Xu, M. (1991). The impact of English-language proficiency on international graduate students' perceived academic difficulty. Research in Higher Education, 32(5),557-570.

Yao, D., Yang, J. & Zhan, X. (2011, August). Predicting breast cancer survivability using random forest and multivariate adaptive regression splines. In Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, 4, 2204-2207. IEEE. DOI: 10.1109/EMEIT.2011.6023012

Yi, H.S. & Na, W. (2020). How are maths-anxious students identified and what are the key predictors of maths anxiety? Insights gained from PISA results for Korean adolescents. Asia Pacific Journal of Education, 40, 247-262. Retrieved from

Youssef, A. M. & Pourghasemi, H. R. (2021). Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region. Saudi Arabia. Geoscience Frontiers, 12(2), 639-655. Retrieved from

Yücel, Z. & Koç, M. (2011). The Relationship between the Prediction Level of Elementary School Students’ Math Achievement by their Math Attitudes and Gender. Elementary Education Online, 10(1), 133-143.




How to Cite

Bezek Güre, Özlem, Şevgin, H., & Kayri, M. (2023). Reviewing the Factors Affecting PISA Reading Skills by Using Random Forest and MARS Methods. International Journal of Contemporary Educational Research, 10(1), 181–196.