Chapter III METHOD SubjectsThe following research was completed using public middle and junior high schools in the counties of Cook (excluding Chicago Public Schools), DuPage, Kane, Lake, McHenry and Will.
ApparatusThis research study was conducted using a quantitative approach. The Illinois School Report Cards for the 2006 academic year are compiled by the Illinois State Board Education (ISBE) and posted on their website in portable document format (PDF). The database that contains the Illinois School Report Cards is searchable by city. The data to be extracted from Illinois School Report Cards includes: eighth grade math ISAT passing percentages, demographics percentages, low income percentage, mobility rate percentage, attendance rate, average class size, pupil-to-teacher ratio, minutes of math instruction per day, teacher gender percentages, average teacher experience, average teacher education, average teacher salary, average administration salary, and average dollars expended per student for instruction.
ProcedureOnce a subject middle or junior high school is located on the ISBE website, the school’s Illinois School Report Card was downloaded as an Adobe PDF file. Each of the several hundred subject school files were then opened and manually parsed for the data (itemized above in the Apparatus section) related to the education production function being researched in this study. The parsed data was manually entered into a Microsoft® Excel® spreadsheet. The data was arranged in columns. To facilitate the multiple linear regression analysis, the eighth grade math ISAT passing percentages was entered in the first column of the spreadsheet indicating it is the dependent variable. The independent variable data (e.g., the controllable and non-controllable factors) were entered in the remaining columns. Controllable factors are defined as aspects of the school environment that the school district’s administration can exhibit partial or full control over. These include the following categories of data reported on the Illinois School Report Card: pupil-to-teacher ratio, average class size, minutes of math per day, percentage of male teachers, teachers’ average years of experience, percentage of teachers with M.A.’s or higher, average teacher’s salary, average administrator’s salary and expenditures per student on instruction. The non-controllable elements of the Illinois School Report Card include data such as demographics percentages, low-income percentage, mobility rates and attendance rates. The non-controllable elements by definition cannot be controlled or manipulated by the school district’s administration. However, non-controllable elements are part of every school’s environment and thus will be analyzed to determine their effect on eighth grade math ISAT passing percentages. As part of this procedure,
the researcher used multiple linear regression to analyze how controllable and
non-controllable factors influence the eighth grade math ISAT passing
percentage. Because the analysis
software (Microsoft Excel) is limited to 16 independent variables and there are
nine independent controllable and nine independent non-controllable variables,
this analysis was accomplished in three steps.
The first step used multiple linear regression analysis to analyze the
affects of the controllable factors on the eighth grade math ISAT passing
percentage. The second step used
multiple linear regression analysis to analyze the affects of the
non-controllable factors on eighth grade math ISAT passing percentage. The final step combined the statistically
significant controllable and non-controllable independent variables from steps
one and two and performed a final multiple linear regression analysis. This last analysis determined how the
combined influence of controllable and non-controllable factors effect eighth
grade math ISAT passing percentage. The
multiple linear regression method of analysis will be described later in the Data
Analysis section. Limitations The
population in the study was limited to public schools containing eighth grade
classes in counties of Cook (excluding Chicago Public Schools), DuPage, Kane, Lake, McHenry and Will. Data AnalysisThe data for this research was collected using a quantitative approach. The data collected in this study will be presented through graphs and tables. The data analysis method to be used is multiple linear regression analysis. Multiple linear regression analysis is an analysis process similar to linear regression analysis. In linear regression analysis, there is a one-to-one relationship between the input values and the output values, i.e. one independent variable’s value is compared to one dependent variable’s value. Mathematical analyses software used to analyze linear regression attempts to create a linear equation that best fits the relationship between the independent and dependent values, that is, the relationship of the output values to the input values. The usefulness of the mathematical equation is its ability to predict output values based on input values, provided the input values stay within the domain of the original input data. Multiple linear regression analysis attempts to achieve a similar mathematical objective as linear regression analysis. The difference between linear and multiple linear regressions analyses is that multiple linear regression has more than one independent variable (input) for each dependent variable (output). An example of a multiple linear regression relationship would be an analysis of how fast a person could run a 100-meter race. In this scenario, the time it takes to run 100 meters is the dependent variable (output value). The independent variables (input values) are numerous and may include such factors as the runner’s age, height, weight, and gender. Both linear and multiple linear regression analyses have a requirement with respect to the independent variable (input) data. This input data must be normally distributed (Osborne & Waters, 2002) . Non-normally distributed input data can distort the mathematics of the input-output relationship and thus the significance each independent variable plays on the dependent variable (Osborne & Waters, 2002) . Input data values that are not normally distributed are referred to as outliers and must be removed from the input data set (Osborne & Waters, 2002) . Outliers can be identified either through visual inspection of histograms or frequency distributions, or by converting data to z-scores (Osborne & Waters, 2002) . Data cleaning plays an important role in linear or multiple linear regression analyses. Analyses by Osborne (2001) show that removal of outliers can reduce the residual errors and improve accuracy of the linear or multiple regression analysis models. For the purposes of this research, outliers will be identified by visually inspecting graphs. The inspection graphs will be created by plotting the residual errors (or residuals) from each multiple regression analysis. These graphs are called residual plots (Rumsey, 2007) . The residuals represent the difference between actual values (a school’s actual eighth grade math ISAT passing percentage) and the passing percentage predicted by the multiple linear regression analysis’ predictive output equation. Residual plots should approximate a normal distribution plot. Data points that do not closely approximate to the normal distribution will be considered outliers and removed from the model. The statistical tool to be employed in the data analyses will be Microsoft’s Excel spreadsheet program. Excel contains several powerful data analyses tools, including one that performs multiple linear regression analyses. The following table shows some of the statistics produced by Excel's regression analysis function.
For the purposes of this research, the two statistics that will be examined from Excel’s multiple linear regression analyses are the “P-value” and the “Coefficient”. The P-value is used to interpret the how independent the independent variable truly are. In short, the P-value of each independent variable is the probability the independent variables have nothing to do with the linearity of the multiple linear regression model. When using multiple linear regression analysis, a value of 0.05 is established as the maximum allowable P-value for an independent variable. The value of 0.05 indicates that there is no more than a 5% probability that the contribution of this independent variable to the linearity of the multiple linear regression analysis is purely by chance. Another way of stating this is that a P-value of 0.05 translates into at least a 95% probability that the independent variable is statistically significant and contributes to the linearity of the multiple linear regression analysis. After the multiple linear regression analysis is performed, the P-value for each independent valuable is checked. If the P-value of an independent variable is greater than 0.05 than that independent variable is considered statistically insignificant. The insignificant independent variable is not influencing the linearity of the multiple regression analysis and thus should be removed from the model. Following this method, only statistically significant independent variables are used in the multiple regression analysis. The Coefficient statistic is used to create a predictive equation. For this research, it will be used to create the education production function. The “Intercept” value of the Coefficient is the fixed or static value of the multiple linear regression analysis’ predictive output equation. The independent variables Coefficients are the dynamic values of the multiple linear regression analysis’ predictive output equation. When creating the multiple linear regression analysis’ predictive output equation, the independent variable’s Coefficients are multiplied by the school’s actual values for the controllable characteristics. The following is a demonstration of how to interpret the example data shown in Table 2. First, we have to examine the P-values for the independent variables shown in Table 2. The P-value’s insignificant cutoff is any value greater than 0.05. The P-values for independent variable #1 and #3 are 0.00 and 0.04, respectively. These independent variables are statistically significant to the model. The P-value for independent variable #2 is 0.13 and is thus above the insignificant cutoff value. Independent variable #2 is not statistically significant to the model and should be removed from the multiple linear regression analysis. If independent variable #2 is removed from the multiple linear regression analysis, the accuracy of the model will increase ( Cameron, 2007 ; Microsoft, 2005 ; Rutgers, 2007 ). Finally, removing the statistically insignificant independent variable from the regression analysis optimizes the mathematical model. The resulting coefficients can then be used to create a multiple linear regression analysis predictive output equation. For this example, the predictability equation would be:
This example provides several pieces of information about the predictability of this relationship. First, the intercept or constant of the equation is a negative 51.1. Thus, if the values of the two independent variables were zero, the resulting dependent value would be –51.1. Second, the coefficients of the first and third independent variables are positive. So as the first and third independent variables increase and are summed with the constant, the dependent value will rise and may eventually become a positive number. Finally, since the coefficient of the first independent variable is 0.84 and the coefficient of the third independent variable is 0.42, a unit change in the first independent variable will have twice the impact on the predictability relationship as a unit change to the third independent variable. Multiple linear regression analysis will be used on of the data gathered from the ISBE website to produce a multiple linear regression analysis predictive output equation that shows the effects of controllable and non-controllable factors on eighth grade math ISAT passing percentages. |