Starting tomorrow, all ANCS students in grades 3-8 will begin taking this year’s Georgia Milestones tests. These tests are taken by all public school students in Georgia as required by state and federal law, and there’s significant attention and weight given to the results of them. In my opinion, this overemphasis on scores from a single test–to serve as a proxy for whether a student, teacher, or school is “successful”–skews what could be valuable information we could glean from these tests as a part of a more complete understanding of what students know and can do and whether a school is fulfilling its mission–including purposes beyond what can be measured by a multiple choice test. And I’m not alone in this opinion. The condensed interview below with Harvard Graduate School of Education professor Dan Koretz–who has written and researched extensively on educational testing–about his book Measuring Up: What Educational Testing Really Tells Us raises some important points about how we can take a more sensible approach to standardized tests, something important to keep in mind in this season of test prep, Milestones pep rallies, and extreme test security that often dominate many schools during this time of year.
Measuring Up, the book by Professor Dan Koretz, gets beneath the surface of educational testing by taking a deep look at key issues that affect students’ scores. Students in one of his HGSE courses, “Understanding Today’s Educational Testing”, persuaded Koretz that a book was needed to help not only educators, but the public at large, to understand testing. “Testing has become enormously important with an extraordinarily powerful influence on schooling, and it increasingly dominates public debate about education,” he says. “The debate, however, is more heat than light, in part because testing is so poorly understood.”
What do you consider some of the most fundamental issues in educational testing today?
In this era of No Child Left Behind, the elephant in the room is high-stakes testing, which holds educators and students accountable for test scores. High-stakes testing has become the cornerstone of education policy in this country, and it is having tremendous effects on schooling, on teachers, and on kids. Unfortunately, we don’t do a good job with accountability, and the issues we need to confront to do it better are poorly understood and are often swept under the rug. For example, the evidence is clear that high-stakes testing can produce severely inflated scores, meaning increases in scores far larger than real improvements in student learning. Few policymakers fully understand this problem, and some simply deny it, so it does not get addressed.
There are many other important issues raised by our current uses of tests. For example, at the federal level there is a strong effort to improve the education of students with disabilities and of nonnative speakers of English. As part of this effort, these students are increasingly included in the same testing programs used with the general education population. As a former special education teacher, I consider these efforts to improve the education of students with special needs to be long overdue. However, our policies for testing them are not entirely sensible, and we risk harming precisely the kids we want to help.
It is not only policymakers who confront test scores. Parents also need to know how to make use of test scores in choosing a school and how to interpret their own children’s scores. Concerned citizens often need to understand test scores to make sense of the frequent press coverage of international comparisons of student achievement. Teachers need to understand test scores to make use of them in improving instruction.
Do you think the public is aware of the limitations of testing and some of the mistakes that occur?
For the most part, I think the public has a very limited understanding of these issues. First, it is essential to distinguish the inevitable limitations of tests from mistakes or distortions. Even a very well-designed test is subject to a degree of imprecision, just as a political poll is subject to a margin of error. For this reason, parents in some states, including Massachusetts, receive reports showing that their child’s performance falls within some range of the score they actually obtained. This is what is meant by “reliability.” The more reliable a test score, the smaller that range of uncertainty. Moreover, different tests of the same subject often sample differently from the material in that subject and therefore provide somewhat different views of achievement.
But sometimes a score is not just imprecise but also misleading. This is called “bias,” and it can arise from many sources. For example, the math scores of students for whom English is a second language may be misleadingly low if the math test includes linguistically complex test items.
To their credit, some states and localities have offered the public some explanation of these issues. Still, I think few parents, and for that matter, few educators really understand them because they have never seen an adequate explanation. That was one reason I wrote Measuring Up — to provide people with a straightforward, nontechnical explanation of issues such as reliability and bias.
What advice do you have for parents and noneducators looking at the latest test results from their children or from within their communities?
If used sensibly, test scores provide unique and valuable information about student achievement. The trick is using them sensibly, which requires recognizing the limitations of testing as well as its strengths.
Don’t take test scores to mean more than they do. Tests measure only some of the important goals of schooling, and even in measuring those, they are only approximate indicators. They are subject to measurement error; different tests of the same subject often provide a somewhat different picture; and indicators other than tests often tell quite a different story. Therefore, a single score, taken alone, cannot provide a comprehensive measure of the achievement of a student, and it certainly is not sufficient to judge the quality of a school or an educational system.
Use tests together with other information. Ignore small differences in scores, which often do not represent meaningful differences in achievement. In this era of high-stakes testing, be wary of score inflation; improvements in scores, particularly very large and rapid ones, may be illusory.
None of this is reason to ignore test scores. They provide important information that one cannot get from other sources. For example, we know that grading standards vary markedly from school to school. Therefore, grades are not necessarily comparable from one place to another, but test scores are. Use scores for what they provide, but be sensible.
How can education policymakers make testing a more effective part of accountability?
The advice I offer to parents about using and interpreting scores applies to educators and policymakers as well, but they have additional responsibilities for deciding how tests will be used.
Let’s start with test-based accountability, which is perhaps the most pressing issue today. As both a former schoolteacher and a parent of two children who went through public schools, I am convinced that we need more effective ways to hold educators accountable, and I believe that testing has to be a part of an effective accountability program. Doing this the way we do in many places now, however — treating one test as a comprehensive indicator of student achievement, pretending that scores taken by themselves are a trustworthy indicator of school quality, and rewarding and punishing teachers and students for scores — is just too simple. It ignores not only what we know about testing, but also what we know from many other fields, such as healthcare, about the effects of incentive systems. We face an enormous challenge in designing better educational accountability systems, and the first step in doing that is recognizing the limitations of what has been tried to date.
What can educators and policymakers do now to start on the path toward more effective accountability? First, they need to step back and ask themselves what the goals of schooling are and what they want to see happening in schools. For an accountability system to work well, it has to recognize the range of important goals that teachers should be addressing, not just the aspects of math and reading that are easily measured by standardized tests. We don’t have a good recipe for doing this yet, and therefore, policymakers and educators need to experiment and try new approaches. Second, because, due to score inflation, rising scores are not sufficient to indicate that reforms are working, it is essential to evaluate these efforts. Are we seeing the types of changes we want when we observe classrooms? Are scores being inflated, or are teachers finding ways to boost student learning?
What other advice do you have for educators and educational policymakers about testing?
Leaving aside the issue of accountability, I offer educators and policymakers other advice in Measuring Up. For example, we should stop placing so much emphasis on “performance standards” in reporting the performances of schools and students. For any number of reasons, this is a very poor strategy for reporting, one that generates bad incentives, creates badly distorted views of trends in performance, and leads to serious misinterpretations by the public. We should be more realistic about testing students with special needs so that we can begin designing more effective and helpful ways of assessing them. All in all, we have to be realistic and careful in our uses of tests. As I note at the end of Measuring Up, “In all, educational testing is much like a powerful medication. If used carefully, it can be immensely informative, and it can be a very powerful tool for changing education for the better. Used indiscriminately, it poses a risk of various and severe side effects. Unlike powerful medications, however, tests are used with little independent oversight. Let the buyer beware.”