www.ijcer.net Comparison of Artificial Neural Networks and Logistic Regression Analysis in PISA Science Literacy Success Prediction

The present study aims to determine which analysis technique-Artificial Neural Networks (ANNs) or Logistic Regression (LR) Analysis-is better at predicting the science literacy success of the 15-year Turkish students who participated in PISA research carried out in 2015 by using learning time spent on science, test anxiety, environmental awareness, environmental optimism, epistemological beliefs, inquiry-based science teaching and learning practices, instrumental motivation, and disciplinary climate in science classes as the predictor variables. For this purpose, the data from 5895 students who participated in the PISA 2015 test were analyzed. Models were developed using LR and ANNs, and the results were compared. As a result, although the classification performance of artificial neural network is significantly better compared to LR, it is understood that practical significance is low due to the intersection of AUC confidence intervals.


Introduction
In today's world, education and training activities are considered not only as transferring information to students but also the acquisition of high-level skills such as applying the learned knowledge to real life problems, conducting team studies, and learning to learn. As the perspectives on education have recently changed, it is also valuable for students to gain these skills and to measure how much these skills are gained.
The research such as Trends in International Mathematics and Science Study (TIMMS), Progress in International Reading Literacy Study (PIRLS) are conducted worldwide to observe the results of investments made in the field of education and make comparisons with other countries. The Programme for International Student Assessment (PISA), which is implemented by the Organisation for Economic Co-operation and Development (OECD) countries, is one of the researches carried out for this purpose. PISA started to be implemented in 2000 and is held every three years with the participation of countries that are OECD members and non-member countries, as well. In PISA, in which the students who complete the compulsory education of the countries attend, the aim is to assess the level of students' basic knowledge and skills required to be successful in real life. 15-year-old students from seventh and above grade levels participate in PISA research (Republic of Turkey Ministry of National Education [MEB], 2016). In PISA, survey studies are also conducted on variables that may affect student performance. Through the items related to socio-cultural, economic, and educational fields, information is gathered about the students, parents, and schools.
The competencies of students in mathematics, science and reading are measured within the scope of PISA research. At this stage, the concept of literacy becomes prominent. Literacy is defined as the ability to use the knowledge and learning outcomes that they gained from the problems they face, to be able to make inferences about the subject by analyzing and to effectively communicate with others. In 2015 PISA research, mainly science literacy was measured. Science literacy is defined as the ability to deal with science-related issues (MEB, 2016).
It is crucial for policy makers to identify the characteristics that affect student success to make better decisions. In this context, PISA results are a very useful data source to discover student performance and variables that describe the performance. In this context, besides regression or structural equation models for predicting student performance, analyses such as ANN, support vector machines, decision trees, and data envelopment analysis can also be used. LR and ANN analysis, which are used in this research, are discussed in detail in the next sections.

Logistic Regression
In LR analysis, prediction of the dependent variable is provided by using independent / predictor variables. The independent/predictor variables in this analysis can be continuous or categorical. However, the dependent variable must be categorical. Predicting of group membership is an example of a LR problem (Tabachnick & Fidell, 2013). In cases which the dependent variable is categorical, LR analysis is preferred rather than multiple regression analysis (Kline, 2011). LR is more flexible compared to other techniques. Contrary to most analyzes, the predictor variables do not have to be normally distributed. These variables can also be continuous, categorical, or mixed. However, while the negative predictor value can be produced in multiple regression analysis, LR does not produce negative values (Tabachnick & Fidell, 2013). There are two methods used in LR analysis. These are called standard and stepwise methods. Stepwise methods are two parts, forward and backward (Çokluk, 2010).
In cases where the dependent variable has two categories, binary LR is used. In LR, independent variables are determined through logarithmic operations. The LR equation with one independent variable is as follows (Rençber, 2018): The value to be obtained from the formula varies between 0 and 1. β 0 indicates the constant in regression. β 1 regression coefficient specifies the effect of the independent variable on the dependent variable.
/(1-) ratio refers to the probability of belonging to a group or not (odds). The logarithm of this value is called the logit value of logodds or . This transformation can be formulated as follows (Retherford & Choe, 1993): When the equation (2) is considered as a correlation function and xs are placed to indicate independent variables, the following logit model emerges (Oğuzlar, 2005): When the data is processed in this equation, the dependent variable increases to 1 or approaches to 0 by decreasing. If the resulting value is greater than 0.5, the solution will result in 1, and 0 if it is smaller (Rençber, 2018).

Artificial Neural Networks
ANNs are algorithms that mimic how neurons and the human brain work and are based on learning information such as the biological brain and creating new information automatically from what they learn. ANNs are a great interest for researchers as a branch of the science of artificial intelligence and it allows computers to learn by massively parallel computing systems (Jain, Mao, & Mohiuddin, 1996). Kim (2017) stated that the ANNs are actually a machine learning model. They are the models that try to imitate human intelligence, and these models were created by analyzing the structure of biological nerve cells (neurons). Neurons receive, process, and transmit information via biochemical reactions (Abraham, 2005). The general structure of a biological neuron is shown in Figure 1.  (Abraham, 2005) Dendrites in biological neurons receive input signals and transmit them to the soma. This information is processed in the soma. Axons convert these signals into output signals. Connections provide electrochemical contact between neurons. The basic principles of ANNs were first formulated by McCulloch and Pitts in 1943 based on assumptions listed as follows (Graupe, 2013):  The activity of a neuron is all or none principle.  The only significant delay in the nervous system is synaptic delay.  The activity of any inhibitory synapse strictly prevents the simultaneous stimulation of the neuron.  The structure of the connection network between neurons does not change in time.
ANNs are the structures made up of a large number of processing units connected together, similar to biological neurons. ANNs have interdependent processing elements (nodes). Each node is called an artificial nerve. An artificial neuron is shown in Figure 2.  (Abraham, 2005) Inputs: The data transmitted from the outside world or another neuron (x 1 , x 2 …x n ).
Weights: The data is processed by multiplying the weights. Weights are distributed randomly at the initial training phase (Henseler, 1995). Weights are called (w 1 , w 2 … w n ).
Summation function: It is the unit where weighted totals are calculated. Values from the inputs are summed up at this stage, multiplied by weights. The result is compared with the threshold value (Shanmuganathan, 2016). w as the weight value, x as the data entering the cell, n as the total value to indicate the number of inputs is calculated by the following formula (Graupe, 2013): conditions to affect the output…" (Radi & Hindawi, 2013, p. 185). The main reason for using an activation function is to provide the non-linear feature to the neural network. Without using the activation functions, a neural network can perform a limited number of operations to learn nonlinear functions. The output we expect neural networks to learn is rarely linear (Heaton, 2012). Table 1 shows the commonly used activation functions and mathematical formulas (Rençber, 2018, p. 55

( )
Step Function The activation function generates an output value. This output value can be sent to another cell or it can be given out as a result (Veelenturf, 1995).
ANNs are made up of combined artificial nerve cells. Cells come together in the same direction in layers called input, hidden and output layers (Patterson & Gibson, 2017). These layers is shown in Figure 3. In the Input Layer, information coming from outside to the network is sent to the Hidden Layer which classifies the input vector and Output Layer retrieves the representations coming from the hidden layer (Hristev, 1998). In ANNs, the inputs are multiplied by weights and included in the function. Weight values must be determined for the network to learn correctly. This process is called network training. These values are randomly distributed when data first arrives in the neural network, they are recalculated as examples are given. In this way, suitable weight values are found. When the correct weight value is achieved, the artificial neural network reaches the level to generalize the problem represented by the samples. This is called network learning. The learning process of the network takes place through certain rules which are called learning rules. There are different learning rules such as Hebb rule, Hopfield rule, Delta rule, curved decline rule and Kohonen learning rule. After the completion of network training, it is tested whether it has completely learned or not. This process is called network testing. The network is asked to produce output by providing information that it does not see during training. The network produces results using the weights learned. The higher the results are, the better the network has learned (Elmas, 2018;Öztemel, 2006). ANNs are classified as single-layer or multi-layer according to layer architecture. Single layer networks have a single input layer and an output node. This simplest neural network is called a perceptron. Multilayer neural networks contain more than one computation layer. A perceptron contains an input and output layer. The input layer transmits data to the output layer and all calculations, and it is fully visible to the user. In multi-layered neural networks, there are additional middle layers between the input and output layers and these middle layers are called hidden layers because the calculations made are not visible to the user. The specific architecture of multi-layered neural networks is also called feedforward networks because the successive layers are supported by each other forward from the input (Aggarwal, 2018 This study aims to compare ANNs and LR Analysis techniques at predicting the science literacy success of the 15-year Turkish students who participated in PISA research carried out in 2015. For this purpose, the following questions were considered: 1. According to the variables of students' learning time spent on science, test anxiety, environmental awareness, environmental optimism, epistemological beliefs, inquiry-based science teaching and learning practices, instrumental motivation, and disciplinary climate in science classes: a. What accuracy rate does the model created by LR produce in predicting the success of students' science literacy? b. What accuracy rate does the prediction model created by artificial neural network analysis produce in predicting students' success in science literacy? 2. Which model is better in predicting student achievement when ANNs and LR are compared? 3. What is the importance level of the variables included in the model in predicting science success?

Method
The variables used as predictors in the analysis are learning time spent on science (SMINS), test anxiety (ANXTEST), environmental awareness (ENVAWARE), environmental optimism (ENVOPT), epistemological beliefs (EPIST), inquiry-based science teaching and learning practices (IBTEACH), instrumental motivation (INSTSCIE), and disciplinary climate in science classes (DISCLISCI). LR and ANNs were used to determine the levels of predicting the success of science literacy of these variables and the estimation ratios were compared. Therefore, correlational model (Fraenkel, Wallen, & Hyun, 2012) has been adopted in the research. Predictor variables have been chosen according to their correlation with the science subtest score. The variables which have correlation above .10 are included in the analyzes. The only exception is test-anxiety which has correlation .08, yet it is considered as a potential important variable for the prediction of achievement.

Sample
The population of the students (aged 15) with which the PISA research conducted in 2015, is 1,324,089 and the population of the students that can take PISA test is 925,366 in Turkey. The sample of this study is 5895 students who participated in PISA 2015 test. In PISA, the school sample is determined using stratified random sampling. According to Statistical Regional Unit Classification in Turkey (SRE) Level 1; 5,895 students from 187 schools representing the 12 regions in 61 cities. When the number of students is compared, the result is that the most participation is from the Istanbul region and the least is from the Eastern Black Sea region (MEB, 2016).

Data Collection Tool
In PISA research, literacy levels of students are measured in different areas such as reading, mathematics, collaborative problem solving, and science. In addition, the affective features associated with academic achievement score is assessed. Items applied to students includes the topics such as how motivated students were to learn about science, instrumental motivation towards science subjects, science self-efficacy (MEB, 2016). Computer-based tests are used in the PISA research. Each student has two hours to complete the test. Test consists of essay type and multiple-choice items. There is no fixed test form for students, i.e. different students take different item combinations. Moreover, students are asked to apply questionnaires about themselves, their families, and school environment (OECD, 2018). In this way, the students' willingness to learn science, whether they find science learning useful, how good they are at solving science problems and difficulties are investigated.

Data Analysis
First, students were ranked according to their science literacy levels, and 27% of the most successful group and 27% of the most unsuccessful group was determined. Accordingly, 1341 students were included in each group. Then, LR, via Jamovi 1.2.0.0 (The Jamovi Project, 2019) software, was used to determine whether students are in the lower or upper 27% according to their PISA 2015 science literacy scores (PV1SCIE) to be able to estimate with the variables SMINS, ENVAWARE, ENVOPT, IBTEACH, EPIST, ANXTEST, INSTSCIE and DISLISCI. To predict the same variable, an artificial neural network was created through Multi-Layer Perceptron (MLP) with the same predictor variables and which group the students belong to was estimated. The standardized method was used to the rescaling of continuous variables, and 90% of the data set was used for training and 10% of the data set was used for the test. This proportions are used because of two reasons. First, the data already split to 27% groups which has already reduced the amount of data, and second, generally full data set is used for LR in educational research. These proportions are used for LR and ANNs comparability purpose and both techniques used the same training data. Initial seed value was determined as 2081980 for reproducibility of the results. The lowest number of hidden layers is set to 1, and the maximum number of hidden layers to 50. In training, batch method was used. The artificial neural network has been tested through IBM SPSS Statistics v20. PROC v.1.16.1 (Robin et al., 2011) was used on R 3.6.0 (R Core Team, 2019) to compare the classification performance of both LR and ANNs.

Findings
According to the LR conducted to estimate whether the students are in the lower or upper 27% groups according to PISA 2015 science literacy scores, it was found that the full-model with all predictor variables was significant against the constant-only-model ( 8 2 =1169; p<.05). This shows that the predictor variables are able to distinguish between students with low and high science literacy. In addition, when the pseudo R 2 values of Negelkerke, McFadden, Cox and Snell, which are calculated for how much of the group membership of the students are estimated with the tested regression model, the following values are obtained respectively: .314, .353 and .471. R 2 values calculated in LR tend to be smaller than the values calculated in multiple linear regression models; therefore, it is considered sufficient to have McFadden R 2 value between .20 and .40 (Alpar, 2011;Tabachnick & Fidell, 2013). When the classification accuracy of the students was analyzed, it was found that a satisfactory classification was made even though no achievement variable was used. 76.5% of the students in the unsuccessful group; 79.8% of students in the successful group and 78.2% of all students in general was classified correctly. The Receiver Operating Characteristic (ROC) curve obtained for classification is given in Figure 4. The area under this curve [Area Under the Curve] (AUC) value was calculated as .856 [.842 -.871]. This value indicates that students are classified very well with predictor variables. Cantor and Kattan (2000) indicated the cut-off values for AUC as .80, good; .65, fair, and .50, poor. Regression coefficients and odds ratio values for predictor variables are given in Table 3.  In Table 3, there are lower and upper limits of the 95% confidence interval for the regression coefficients, Z and Odds Ratio values for the eight predictor variables. Accordingly, all variables have a significant role in predicting students' position in the lower or upper group according to PISA 2015 science literacy success. In a one-unit increase in each predictor variable, 1 was subtracted from the e B coefficients and multiplied by 100 to determine what change is expected in the group belonging odds of the students (Hair, Black, Babin, & Anderson, 2014). In this way, with a one-unit increase in the ENVOPT variable, a decrease of approximately 41.6% can be expected in the odds of students to be in the unsuccessful group ([e -0.538 -1] .100 = -41.61). Oneunit increase in the EPIST variable, an increase of approximately 49.8% can be expected in the odds of students to be in the unsuccessful group ([e 0.404 -1] .100 = 49.78).
According to the MLP results, it was observed that a satisfactory accuracy classification was made in both training and test sets, even any success related variable was not used when estimating the students' group. In the training set, it was found that 77% of students in the unsuccessful group, 80.7% of students in the successful group and 78.9% of all students in general can be classified correctly. In the test set, 80.4% of the students in the unsuccessful group, 88.1% of the students in the successful group and 84.3% of all students in general were classified correctly. The ROC curve in Figure 5 was obtained for the training and test sets. The area under this curve (AUC) was found to be .868 [.854 -.881] for both training and test sets. The fact that the AUC value obtained is greater than .80 indicates that the students are also classified very well with MLP. Correct and incorrect classification rates for estimating the lower or upper groups according to LR and ANNs according to PISA 2015 science literacy are summarized in Table 4. As can be seen in Table 4, multilayer perceptron has been able to correctly classify the group that students belong to above 80%, especially in the test data. In addition, in comparing LR and classification performance of ANNs AUC values were compared as suggested by DeLong, DeLong, and Clarke-Pearson (1988) and a significant difference was found between the two AUC values (Z = 3.636; p <.05). However, when 95% probability confidence intervals of AUC values are analyzed, it is seen that confidence intervals are [.842 -.871] for  for artificial neural network, and therefore two confidence intervals intersect. In this context, although the classification performance of the artificial neural network is found to be significantly better than LR, the significance level is practically low due to the intersection of AUC confidence intervals. In the prediction of which group to the students belong to according to their science success with ANNs, the order of importance of the predictor variables is given in Figure 6.

Figure 6. Order of Importance of Predictor Variables
According to the science literacy scores of the students, it is seen that SMINS is the most important variable in predicting if they are in the lower or upper groups with ANNs, and the least important variable is DISLISCI. Among the variables included in this study, it was concluded that the most important factor on student success is the time allocated to learning. In addition, students' interest in environmental problems and inquiry-based science education were found to be important variables that explain success. It is concluded that the variable that has the least important variable that explain student achievement is the disciplinary climate in the classroom.
In addition, there are some studies in the literature that conclude that when two analyses are compared, there is no clear advantage, and the success of the method depends on the situation worked on (Dreiseitla & Ohno-Machadob, 2002;Eftekhar et.al, 2005;Gorr et.al, 1994;Hardgrave et.al, 1994;Manel et.al, 1999;Nefeslioğlu et.al, 2008;Pavlekovic et.al, 2010;Tu, 1996).
In the literature, it is seen that when comparing LR and ANNs' success conditions, the results show that the comparisons are carried out based on percentage values. In this study, after the percentage values were determined, the significance of the difference between the AUC values obtained for classification performance was also analyzed. Accordingly, which groups the students belong to can be estimated correctly with logistics regression at 78.2%, with ANNs at 84.3%. In addition, when comparing the classification performances, ANNs were shown to perform better. However, when the confidence intervals were considered, it was found that this difference was low in practice.
The results show that the most important variable in predicting students' science literacy was the time allocated to science education, and the least important variable was the disciplinary climate in the classroom. Other variables were environmental optimism, environmental awareness, epistemological beliefs, inquiry-based science teaching, test anxiety, and instrumental motivation in terms of the order of importance. When the variables are analyzed, the sensitivity of the student to environmental issues (ENVAWARE and ENVOPT) comes to the fore in predicting science success. In addition, it is expected that preference of inquiry-based science learning (IBTEACH) will increase the expected science literacy score in PISA.

Recommendations
This research was conducted on science literacy score with 2015 PISA results. ANNs and LR models can be used for different subtests of PISA. In addition, different variables can be included in the study to analyze their role in prediction. However, in the present study, only MLP were used as artificial neural network model. Different artificial neural network models or studies to test the predictive power of activation functions can be done. Variables such as computational cost and analysis time required were excluded from the purpose of this research. Therefore, further studies that compare LR and ANNs in terms of analysis time and required computational power are important in determining which method is more efficient to solve the same problem. It will also be useful to consider this finding in the studies to be carried out to increase student success. While developing policies on school and education system, it is especially important to pay attention to student responsibilities and effective teaching techniques, teacher competencies and classroom management.