Okay, So Can We Slow This Train Down (a bit)?
If new laws or policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount, then more teachers might well be terminated than is now the case. But there is not strong evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones. There is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains.
I have only skimmed the briefing paper but it looks like good, sober reading. I plan on sending this to the Board, the Times editorial board, my legislators, etc. Please consider doing the same.
From the Daily Kos which has links galore:
This document has been in the works for several months, and was NOT hurriedly put together as a response to the recent series by the Los Angeles Times which used value-added assessment to label teachers in the Los Angeles Unified School District. Second, the ten scholars whose names are on the document are some of the most eminent in educational circles, including among their midst former Presidents of the American Educational Research Association and the National Council on Measurement in Education, two of the three professional organizations most involved with psychological measurement, of which school-related testing is a subset. One of the scholars, Robert Linn, has not only presided over both of those organizations, he has also serve as chair of the National Research Council's Board on Testing and Assessment. The group also includes the immediate past president of the National Academy of Education, Lorrie Shepard, Dean of the School of Education at Colorado.
The document is thorough. It reviews all the relevant studies, including one not yet in print. Those includes studies by Mathematica for the US Department of Education: by Rand: by the Educational Testing Service; done for the National Center for Education Statistics of the Institute of Education Sciences of the U. S. Dept. of Education; issued by the Board of Testing and Assessment of the Division of Behavioral and Social Sciences and Education of the National Academy of Sciences, and so on. There are citations from books, from peer reviewed journals.
Interesting notes from briefing paper pulled out by Kos:
One study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year.
A study designed to test this question used VAM methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness.
In both the United States and Great Britain, governments have attempted to rank cardiac surgeons by their patients’ survival rates, only to find that they had created incentives for surgeons to turn away the sickest patients.
Kos also pointed out that:
- students are not randomly assigned to teachers
- sample sizes are often too small/makeup of the class may change during the year (especially in schools with many low-income students)
- Even with value-added analysis, to date scholars have not been able to isolate the impact of outside learning experiences, home and school supports, and differences in student characteristics and starting points when trying to measure their growth.
- As testing expert Dan Koretz of Harvard is quoted as noting,
"because of the need for vertically scaled tests, value-added systems may be even more incomplete than some status or cohort-to-cohort systems"
- If measuring end of year to end of year, even if there are vertically scaled tests, there is still the well-documented issue of summer learning loss, which falls disproportionally upon those of lesser economic means, which also means it falls disproportionally upon those of color, who are more heavily represented at the lower end of the economic scale.
The claim that they can "level the playing field" and provide reliable, valid, and fair comparisons of individual teachers is overstated. Even when student demographic characteristics are taken into account, the value-added measures are too unstable (i.e., vary widely) across time, across the classes that teachers teach, and across tests that are used to evaluate instruction, to be used for the high-stakes purposes of evaluating teachers.
Value-added methods involve complex statistical models applied to test data of varying quality. Accordingly, there are many technical challenges to ascertaining the degree to which the output of these models provides the desired estimates. Despite a substantial amount of research over the last decade and a half, overcoming these challenges has proven to be very difficult, and many questions remain unanswered...
From the Daily Kos:
Let me clear. The authors are not opposed to value-added assessment. They are not even opposed to it being included in the process of teacher evaluation, although they offer some serious cautions that policy makers would be well advised to consider.
The title is accurate - there are still serious problems with using test scores to evaluate teachers. These problems are not solved by resorting to a value-added methodology.So let me be clear.
No one, anywhere, is saying that tests don't have value. And, using value-added data helps get a fuller picture of the results. But I don't support that tests results should play a major role in teacher retention or salary.
From the briefing paper:
Some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing tests of basic skills in math and reading. Based on the evidence, we consider this unwise.
Any sound evaluation will necessarily involve a balancing of many factors that provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning.