An analysis of the correlation between standards-based, non-standards-based grading systems and achievement as measured by the Colorado Student Assessment Program (CSAP)
vi Table of Contents Acknowledgments v List of Tables ix CHAPTER 1. INTRODUCTION Introduction to the Problem 1 Background of the Study 6 Statement of the Problem 9 Purpose Statement 10 Rationale 11 Research Hypothesis 11 Significance of the Study 14 Definitions of Terms 16 Assumptions 17 Limitations 17 Nature of the Study 18 Organization of the Remainder of the Study 20 CHAPTER 2. LITERATURE REVIEW 22 Introduction: Rationale for the Research 22 Student Grading 24 Grading Systems 38 Standardized Achievement Testing 49 Research Questions’ Relationship to Major Literature Themes 60
vii Summary: How the Research Will Contribute to the Field 61 CHAPTER 3. METHODOLOGY 65 Introduction 65 Statement of the Problem 65 Research Hypothesis 66 Research Methodology 69 Research Design 69 Population and Sampling Procedure 70 Instrumentation 72 Validity 72 Reliability 73 Data Collection Procedures 73 Data Analysis Procedures 74 Ethical Considerations 75 Limitations 75 Summary 76 CHAPTER 4. DATA COLLECTION AND ANALYSIS 77 Introduction 77 Descriptive Data 77 Data Analysis 78 Results 80 Summary 150
viii CHAPTER 5. RESULTS, CONCLUSIONS, AND RECOMMENDATIONS 153 Introduction 153 Summary of the Study 153 Summary of Findings and Conclusions 154 Recommendations 168 Recommendations for Future Research 168 Recommendation for Practice 170 Implications 171 REFERENCES 173
ix List of Tables Table 1. Mean Average Scores on CSAP Math Assessments 137
Table 2. Districts With Highest Means for Each Sub-Population and Grade Level on the CSAP Math Assessments 139 Table 3. Mean Average Scores on CSAP Reading Assessments 140
Table 4. Districts With Highest Means for Each Sub-Population and Grade Level on the CSAP Reading Assessments 142 Table 5. Mean Average Scores on CSAP Writing Assessments 143
Table 6. Districts with Highest Means for Each Sub-Population and Grade Level on the CSAP Writing Assessments 145 Table 7. Mean Average Scores on CSAP Science Assessments 146
Table 8. Districts with Highest Means for Each Sub-Population and Grade Level on the CSAP Science Assessments 147 Table 9. Overall Correlation For Participating Districts Total Population 155
Table 10. Correlational Values for Ethnicity and 2008 CSAP Scores (research hypothesis H 2) 158
Table 11. Correlational Values for Low SES and 2008 CSAP Scores (research hypothesis H 3) 161
Table 12. Correlational Values for ELL Students and 2008 CSAP Scores (research hypothesis H 4) 164
1 CHAPTER 1. INTRODUCTION
Introduction to the Problem Grading of student work and providing end of quarter grades for students have long been held as a duty for teachers. The managerial function of grading is intended, as supported by Marzano (2000, 2006), Brookhart (2004), and Schmoker (1999), to provide information to multiple stakeholders concerning student achievement. Stakeholders can include parents, teachers, district administrators, collegiate entrance personnel, and students. While stakeholders have a different needs or purposes for obtaining the information, the most critical are the students’ needs related to learning and achievement. Brigden (1998), as part of her doctoral research, interviewed groups of teachers, parents and students about what facets make up a student’s grade and what were the best methods for reporting progress to students and parents. Teachers felt the best way to communicate grades was through a percentage based system and felt that the grade should be based on a combination of achievement results, such as test results, and non-achievement factors, like effort and work habits. Students, however, felt that the percentage grade was adequate, as long as it was accompanied with written comments and conferences to explain the reasoning behind the grade. Students did recognize that test results and homework might be part of an overall grade, but were not cognizant of non-achievement factors that teachers might use in deciding on a grade. Parents felt similar to the student views, and appreciated some written or verbal comments accompanying the grade that was provided. Parents did not fully understand all the factors that were used in
2 determining a grade and did not realize that many teachers valued work habits in the same vein as achievement factors. Brigden found that teachers felt satisfied with the grades that were given, while the students and the parents did not feel they fully understood the grades and had concerns about the factors that made up the evidence that the teacher used in determining the grade. While the individual grade that each student receives for an assignment might have little or no impact on the student’s educational career, the compilation of grades over the course of a quarter, semester, or year reflects a student’s body of knowledge and skill in its entirety for that child’s academic year. The variability that can exist in the subjective process of grading a student’s work can, when viewed over the course of an extended period, have a significant effect on how each student is viewed in terms of his or her level of understanding. Stronge (2002) indicates that the ultimate reflection of an effective teacher is student achievement results. Achievement is a reflection of what the student has learned and as importantly, how successfully the teacher has taught. There are multiple methods to measure student learning and achievement. The use of both formative and summative assessments, homework evaluation, recitation, and other performance tasks provides evidence that allows the teacher to make a judgment as to the level or degree that the student has mastered the material. The teacher has the ability to use differing criteria to grade the student, with the most effective criteria being a standard or set “learning criteria” (Guskey, 1994, p. 17). Without set criteria, the subjectivity of grading has no basis other than a comparison between fellow students or an arbitrary sense of good versus poor quality work. This high level of subjectivity allows individual teachers to make decisions on how, and what they will grade in determining a student’s overall grade
3 that is reported to the varying stakeholders. Beyond the classroom, schools, and school boards have, and typically do provide some level of description as to how the teacher might or should record and report students grades. In most school systems, the school board, or in some cases, individual schools have outlined the managerial requirements for grading and grade reporting. The various systems used have a typical scale, that at first glance, appears to be systematically produced to evaluate students on an even interval instrument that is geared to measuring student achievement on a traditional ten base system. This system is often reflected in a percentage or point system that equates an A with a score or percentage of 90%- 100%, a B equating to an average of 80% - 89%, with the end of the scale being represented by any student achieving a 0%-59% receiving an F. Some systems are moving toward a standards-based model, which uses descriptors, such as, unsatisfactory, partially proficient, proficient, or advanced. These descriptors may or may not be linked to percentage scores on assignments, but usually have a rubric or definition of some score attached to inform the reader about the meaning of each (www.rfsd.k12.co.us). While many districts have established grading systems, the critical component in the grading cycle is the criteria or lack of criteria that the individual teacher uses to establish a grade and evaluate the achievement level of the student (Stiggins, 2004). Historically educators have used many varied criteria for establishing a grade for any given assignment that a student might complete. Marzano (2000) looked at the factors that teachers used to determine a grade and found that very little had to do with the student learning the material, but rather was based on effort, behavior, cooperation, and attendance. An educator’s perception of a student can have a direct influence on the
4 grade that was given for the work they produce in the class. Guskey and Bailey (2001, p. 16) state that, “students with behavior problems often have no chance to receive a high grade because their infractions overshadow their performance.” Brookhart (1994) also examined the use of grades as a motivational factor for students; finding that some teachers believe that giving a student a poor grade will help to motivate the students to try harder on their next task. With increased focus on student accountability, states have developed standardized tests that aim to determine if students are learning the material that is considered grade appropriate for each student. Colorado, working with CTBMcGraw Hill, has developed the Colorado Student Assessment Program (CSAP), and has been using assessments in grades three through ten, for the past ten years to determine if the students in Colorado are meeting proficiency levels in reading, writing, math, and science. For this study, it is important to accept that the CSAP assessment has been vetted and provides an accurate measure of what students need to know and are able to do in relation to the state standards. This test of accuracy provided for a comparison of teacher driven grades in relation to a constant and tested measure. The measure can be deemed constant because every student in the State of Colorado, from grades three through ten, is required to take the assessment, and because the CSAP is a standardized measure, which indicates that the testing conditions are the same for every student (N.A, 2008). The Colorado Department of Education (CDE), the governing education agency in Colorado, indicated that the assessments given to Colorado students have a 90% match to the Colorado model content standards (J. O'Brien, personal communication, September 15, 2007). Colorado uses a moderate complexity model to determine the CSAP
5 assessment’s alignment with state content standards. Bhola, Impara, and Buckendahl (2003) describe this model as, One of the simplest of the more complex models, where content panelists are asked to examine the match between content standards and assessments items from the dual perspectives of content match and cognitive complexity match. The addition of a cognitive complexity dimension to the model makes it more comprehensive and useful than one that focuses on the content dimension. (p. 23) It would seem to reason, if a student is being taught material that is aligned with the state model content standards and the student is being assessed in the classroom in relation to those standards, then the student should produce similar results on the CSAP test as the grade earned in the content classroom. While this would seem to make sense, the subjectivity of teachers’ evaluations, the varying approach to different grading systems within schools and districts, and the variability in the use of “non-academic achievement factors to base grades” (Marzano, 2000, p. 4), the potential for a disconnect between the reporting of student achievement at the classroom level and the state assessment level is possible. Educators make critical decisions based on whether or not a student has learned the required material for the grade level. A student’s grades and test scores are often used to determine if a student will be retained, or placed in remedial classes. Correspondingly, colleges often rely on a student’s grade point average to determine acceptance into the collegiate system. While there is no evidence that any college uses a student’s CSAP scores as an entrance requirement, if a student is unable to take a high-level math class because of a low CSAP score, this alone could preclude his or her ability to attend the student’s college of choice (Lang, 2007).
6 Background of the Study Every individual who has participated in the American education system can articulate the factual truth about what an A, B, C, D, or F means when it appears on an assignment or a report card. Marzano (2000) indicates that “about 80% of schools use letter grades from the 4 th grade on” (p. 11), with 90% of high schools using the letter based system. Other indicators are used in the primary grades of the elementary, but are rarely seen outside of the elementary setting. There is a common belief that an A equates with over-achievement, going beyond the standard or working to a higher level. Some systems even use additional marks to indicate gradients of perfection, with +, and -, providing some sort of information that alludes to going beyond the grade or not quite meeting the requirements to completely fulfill all the necessary components to make a solid A or other grade. Many researchers caution against the use of a single score or grade representing the total body of knowledge or level of achievement that a student might have earned over the course of a marking period. To summarize everything a student might do in a classroom into a 73% or a C does not provide the person receiving the communication any indication of what that grade represents (Allen, 2005). Most parents, as indicated by Guskey and Bailey (2001) view a C grade as their student being an average student in relation to the student’s peers in the classroom. Without more information about the grade and what it reflects, parents and students have no way of interpreting the grade in a manner that gives any real information about what the student learned in relation to the material or criterion that was established for the course. Rich (2001), in his doctoral research, found that most teachers feel that content mastery should be the first indicator that influences teacher grading. Secondary to the
7 achievement factor was a litany of non-achievement factors, including student effort, student improvement, student attendance, and self control. Rich also found that teachers used a student’s competitive nature in computing the grade. Through some deeper investigation, older teachers felt the competitive nature of a student was less important than the younger teachers, who ranked a student’s competitive nature in the top three indicators that influence the grade that they assigned to the student. In Marzano’s (2000) meta analysis of student grading, behavior and aptitude can play a significant part in the teacher’s cognitive processes leading up to the assignment of a grade. In Colorado, many school districts refrain from creating actual grading policies, deferring to the schools to develop their own criteria within broad guidelines. There are however several school districts that define, in very specific ways, what it takes for a student to obtain a certain grade and what criteria a student will need to meet to be successful in each system. In every example, less one, the school districts that define a grading system do so in a very similar manner, using the traditional scale, where an A can be identified by earning between 90 and 100% of the total points, a B requiring between 80 and 89% of the total points, a C requiring 70 to 79% of the total points, a D requiring 60 to 69% of the total points and an F is any percentage below 60%. Roaring Fork School District, in Western Colorado, is the only school district, in Colorado, that defines its grading system in a different manner across the kindergarten through twelfth grade spectrum (J. Haptonstall, personal communication, July 20, 2008). Roaring Fork uses a standards-referenced grading system (R. Marzano, personal communication, July 26, 2008) that does not use a scaled system or a point based system, but is intended to measure student learning against defined criteria, established in rubrics created by the
8 staff. According to Haptonstall Roaring Fork’s standards-referenced system is intended to provide students and parents information about how the student is performing related specifically to the standards that have been established by the state, but unlike a traditional system that provides a single grade or score to summarize all of the factors that might make up the student’s overall grade, this system offers descriptive information about each standard and benchmark that was covered over the course of the learning period. Every grading system is intended to provide information about student learning, and most generally, the information is held to be in direct relation to the student’s academic achievement (Brookhart, 2004). This study was conducted to examine the correlation between the Colorado Student Assessment Program (CSAP) and grading systems throughout the State of Colorado. The CSAP test is the measure that the State of Colorado uses to determine student achievement for students in grades three through ten, in reading, writing, and math, and in fifth, eighth and tenth-grades, in science. The CSAP is viewed as an accurate and completely vetted assessment that is standardized in its administration and is viewed by state officials to represent a large body of state standards in its makeup. Ninety-eight percent of the assessment is said to be aligned with Colorado State standards (J. O’Brien, personal conversation, September 2008). This study examined how secondary students’ grades, which are provided for individual subjects, such as reading, writing, mathematics and science, correlated to their achievement on the CSAP test. It was the intent of this research to examine the correlation between current grading systems that are used throughout the State of Colorado, to the performance students demonstrate through their participation on the CSAP assessments. In a similar
9 study conducted by Wright and Wise (1988), rubric scores, often associated with criterion-referenced or standards-referenced scores, had a higher correlation to standardized assessments than grades based on cumulative points. This study was used to help districts gain a better understanding of the validity of the system that they currently have in practice. By comparing the proficiency level of each student with the grade they earn in the classroom, districts can evaluate how accurate their grading system was in measuring student achievement. This information could be used to either validate the system that was in use, or serve as a starting point for districts to modify or create systems that are more responsive to student, parent, and staff needs. Statement of the Problem It was not known how and to what extent the grading of students in the State of Colorado mirror the achievement of students as measured by the CSAP test. While the current grading systems that are in place in the majority of school districts in the State of Colorado have an institutionalized aura about them, the evaluation subjectivity of individual teachers is so broad that the potential for inconsitent measurement of student learning can and does affect student learning. Marzano (2000) found that there are many factors that teachers include when they arrive at a grade for any given assignment, including academic achievement, aptitude, effort, behavior and attendance. Allen (2005) examined the validity of the letter based grading system and its accuracy in actually measuring student learning. Focusing on three major components, Allen states that the systems’ flaws reside in the use of behavior and attitude as assessment criterions, how a single mark can communicate all the learning that may, or may not have, taken place in a
10 quarter or semester, and finally a serious lack of training for pre-service teachers in proper grading and assessment preparation. Allen suggests that pre-service teachers would be better instructed using classroom scenarios and internships, where they could practice assessment development and the finer points of grading student work. With such a disparity in grading, in both the use of grading systems and the consistency in which students are graded, there is an underlying and consistent issue: what does an A mean? Students’ must have clear guidelines for what it takes to earn a certain grade. The teacher must not only provide a clear definition of what each grade represents, but also a clear understanding of the factors that enable a student to obtain a grade. School systems and parents need to work together to establish what information needs to be reported to ensure that all stakeholders have a clear understanding of what the student has learned. This research examined how closely traditional and standards-based systems attempt to measure learning. Purpose of the Study The purpose of this research was to determine the correlation that exists between student’s grades, using quantitative measurement, to the Colorado student achievement assessments. By determining the correlation between grades and achievement, schools systems can support policies and practices concerning classroom grading. This research also brings validity to those districts using standards-based grading systems, due to the stronger correlational values that students represented by the standards-based system produced. Research was conducted at the middle and high school levels, for which end- of-course grades for mathematics, language arts, reading and science were provided.
11 Rationale This study was being conducted to provide participating districts, as well as other school districts, with information that might enable teachers and administrators to align grading systems, classroom grading, and standardized assessment to meet the learning needs of students. While classroom grading systems and grades are intended to reflect student learning, researchers have long held that neither are measures of academic knowledge alone. The use of other factors, like behavior, attendance, and attitude often cloud the purity of grades. Standardized assessments are seen, by most researchers, as an effective means to determine a student’s understanding of concepts related to state standards. This study provided quantitative data that reflected the relationship that grading systems and grades have to the results produced through a standardized assessment. One aspect of the research focused on the relative difference in the correlation for standards-based grading systems and traditional letter based systems to the standardized assessment. The results from this study informed districts of the possible benefits of one grading system over the other. Research Hypotheses The following hypotheses guide this study: H1: The proficiency levels of students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will not show a significant difference when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP).
12 H 01 : The proficiency level of students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will show a significant difference when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H2: The proficiency levels of Hispanic students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will not show a significant difference from Caucasian students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H 02 : The proficiency level of Hispanic students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science would show a significant difference from Caucasian students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H3: The proficiency levels of low socioeconomic students as determined by end- of-course grades in grades six through ten in reading, writing, math, and science would not show a significant difference from non-low socioeconomic students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H 03 : The proficiency level of low socioeconomic students as determined by end- of-course grades in grades six through ten in reading, writing, math, and science would show a significant difference from non-low socioeconomic students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP).
13 H4: The proficiency levels of English as a Second Language (ELL) students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will not show a significant difference from non-ELL students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H 04 : The proficiency level of English as a Second Language (ELL) students as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will show a significant difference from non-ELL students when compared to the proficiency levels as measured by the Colorado Student Assessment Program (CSAP). H5: The proficiency levels of students, in a standards-based grading system, as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will not show a significant difference when compared to the proficiency levels of students in a traditional, letter based grading system, as measured by the Colorado Student Assessment Program (CSAP). H 05 : The proficiency level of students, in a standards-based grading system, as determined by end-of-course grades in grades six through ten in reading, writing, math, and science will show a significant difference when compared to the proficiency levels of students in a traditional, letter based grading system, as measured by the Colorado Student Assessment Program (CSAP).
14 Significance of the Study While there has been, in recent years, a growing interest in the study of grading systems, most studies focus on the particular components of what factors enter into the decision making process in giving a student a grade for an assignment, a test, or the end- of-course grade. Representative studies include Randall’s (2007) use of Guttman’s facet theory to examine the grading practices of teachers, and Brigden’s (1998) examination of the reporting, grading, and meaning of grades in a ninth-grade science classroom. Rich (2001) examined some of the hidden factors in the grading practices of secondary students, and how factors other than learning played a part in the factor of the grade a student might earn in any given secondary classroom. This study seeks to examine the more global aspects of grading to determine if the current systems that are in use actually measure, as identified by the CSAP, student learning. Similar studies have been conducted in other states, where classroom grades have been used as predictors of performance on the various state assessments. Lambert (2002) studied the validity of end-of-course reading scores with the scores students earned on the third grade reading assessment in Texas. Good (2001) sought to determine what school level factors combined to provide educators with the best predictor of student success on the CSAP. The studies mentioned provide insight into the predictability of grades in relation to state standardized assessments, while this study sought to establish the validity and reliability of the grading systems that are reviewed. Instead of acknowledging the current system as flawed or overly subjective, this study seeks to demonstrate the relationship that exists between what a student earns as a grade
15 in a secondary classroom and the score that is earned on the corresponding state achievement test. While there are many elements that make up the various types of grading systems, the key to each is the purpose of the system and how each school or district chooses to report student learning. Ideally, a grade is used to communicate to varying stakeholders some information about student learning (Marzano, 2000). When the communication is focused on the learning targets that have been established for each course, the grade can be used as a learning tool. Stiggins (2004) asserts that when information about what is to be learned and how to learn it is shared with the student, the grade becomes a powerful tool in enhancing student achievement. The student then becomes the master of the learning and can provide feedback to the teacher about what she knows and does not know, and more importantly, how to help her learn the material necessary to move forward and be successful. This study enabled school districts to make a research-based decision about the usefulness of their current system and potentially what type of system serves the purpose of communicating student learning. Whether the system currently used is a standards- based or referenced system, or a letter or numerically based system, if the system does not reflect student achievement and learning, the system does not support the purpose of grading. This study provided each district that participates in the research with an executive summary that shows how its grading system correlates with student performance on the CSAP assessment. This research could potentially be used to help