Thoughts about using student tests to judge teachers
Let me begin by saying that I would absolutely love to be able to use some objective measure of student academic growth as a measure of teacher effectiveness. I sincerely would. I am open to hearing about such a measure that can isolate the growth of student learning that is attributable to the teacher. I'm dying to hear about it. I just haven't heard it yet.
I won't pretend that the proposed measure is the student's cumulative acheivement. I will grant folks who want to use student test results as a measure of teacher effectiveness the sophistication that they are using a measure of change. I'll even grant them an additional level of sophistication that acknowledges the smaller potential for growth among students who are already near the top of the measure. The five point change from 45 to 50 is easier to coach than the five point change from 85 to 90. Just the same, the data and the conclusions have grave faults.
I review data for a living, so I have some familiarity - if not expertise - with how it can be misused. I have very little confidence in the data that I have seen that links student outcomes to teacher effectiveness. It makes a number of unsupportable assumptions. The primary one is to attribute all of the change in student outcomes to the teacher. That's crazy. The second one is to presume that all classes are statistically equivalent. That's also crazy. The third is to base the conclusions on a sample too small to be statistically significant (such as 30 students). The data is suspiciously reported without a margin of error. I find that odd and hubristically certain. Where is the standard deviation of outcomes? I haven't seen any reported. I'm always suspicious that unreported data is contradictory to the conclusion of the author. In this case I wonder if the range of outcomes isn't so great as to make the conclusion unsupportable, that the range of outcomes dwarfs the differences in outcomes.
Let's add some questions about the student assessment used. In Seattle the proposed measure is the MAP. The MAP was not intended for this purpose and is a poor choice. First is the primary function of MAP as a formative assessment. The MAP was designed to spark questions, not to provide answers. Second is the general unreliability of the MAP. We see student scores hop about, in part I suppose because the students can manipulate the experience. Finally we have the constraint that the MAP measures students along one dimension only, grade level progress.
When my eldest daughter first tested into Spectrum I had to ask about twelve people before I got a cogent description of the program. The one I finally got - and it is one that was confirmed by the program manager - was that Spectrum was about going beyond the standard curriculum in three ways: farther (to the next grade level expectations), deeper (a deeper understanding of the concepts), and broader (an understanding of the concepts in a wider variety of contexts). MAP only measures one of those dimensions, the one that I think I care least about: farther. Also, given the District's focus on vertical articulation and curricular alignment, teachers are actively discouraged from teaching students along this dimension. The District leadership actively discourages teachers from providing the students with the support that would really boost their MAP scores - instruction in the next grade level's curriculum.
Unfortunately the District has no other norm-referenced assessment of student academic achievement. The MSP and HSPE (former WASL) is criterion-referenced and therefore absolutely inappropriate for this purpose - unless the assessment is exclusively about getting under-performing students to Standard. Even then, it is inappropriate to use MSP scores for ranking so the only change that can be recorded is changes in Levels (1,2, 3, or 4) for students. This presumes, inappropriately, no progress for a student who did not change levels and loss for students who go down a level. Still, I would be more comfortable with the District using this gross tool as a measure since this test was actually designed for this purpose. It would still present the problems with an inadequate pool to form a statistically significant sample and it would still require the reporting of a margin of error. On top of that there all of the students who opt out. Of course the real problem with the MSP and the HSPE is the slow turnaround in the results and the unknown impact of summer learning (or loss).
So, in short, I'm not saying that I wouldn't love an objective measure of student growth as a measure of teacher effectiveness, but it would have to be better than the suggestions I have seen to date.
The critiques of using student test data to assess teacher effectiveness are well known. So where is there a response from the proponents of this idea to the legitimate concerns and objections? I don't see it.
I won't pretend that the proposed measure is the student's cumulative acheivement. I will grant folks who want to use student test results as a measure of teacher effectiveness the sophistication that they are using a measure of change. I'll even grant them an additional level of sophistication that acknowledges the smaller potential for growth among students who are already near the top of the measure. The five point change from 45 to 50 is easier to coach than the five point change from 85 to 90. Just the same, the data and the conclusions have grave faults.
I review data for a living, so I have some familiarity - if not expertise - with how it can be misused. I have very little confidence in the data that I have seen that links student outcomes to teacher effectiveness. It makes a number of unsupportable assumptions. The primary one is to attribute all of the change in student outcomes to the teacher. That's crazy. The second one is to presume that all classes are statistically equivalent. That's also crazy. The third is to base the conclusions on a sample too small to be statistically significant (such as 30 students). The data is suspiciously reported without a margin of error. I find that odd and hubristically certain. Where is the standard deviation of outcomes? I haven't seen any reported. I'm always suspicious that unreported data is contradictory to the conclusion of the author. In this case I wonder if the range of outcomes isn't so great as to make the conclusion unsupportable, that the range of outcomes dwarfs the differences in outcomes.
Let's add some questions about the student assessment used. In Seattle the proposed measure is the MAP. The MAP was not intended for this purpose and is a poor choice. First is the primary function of MAP as a formative assessment. The MAP was designed to spark questions, not to provide answers. Second is the general unreliability of the MAP. We see student scores hop about, in part I suppose because the students can manipulate the experience. Finally we have the constraint that the MAP measures students along one dimension only, grade level progress.
When my eldest daughter first tested into Spectrum I had to ask about twelve people before I got a cogent description of the program. The one I finally got - and it is one that was confirmed by the program manager - was that Spectrum was about going beyond the standard curriculum in three ways: farther (to the next grade level expectations), deeper (a deeper understanding of the concepts), and broader (an understanding of the concepts in a wider variety of contexts). MAP only measures one of those dimensions, the one that I think I care least about: farther. Also, given the District's focus on vertical articulation and curricular alignment, teachers are actively discouraged from teaching students along this dimension. The District leadership actively discourages teachers from providing the students with the support that would really boost their MAP scores - instruction in the next grade level's curriculum.
Unfortunately the District has no other norm-referenced assessment of student academic achievement. The MSP and HSPE (former WASL) is criterion-referenced and therefore absolutely inappropriate for this purpose - unless the assessment is exclusively about getting under-performing students to Standard. Even then, it is inappropriate to use MSP scores for ranking so the only change that can be recorded is changes in Levels (1,2, 3, or 4) for students. This presumes, inappropriately, no progress for a student who did not change levels and loss for students who go down a level. Still, I would be more comfortable with the District using this gross tool as a measure since this test was actually designed for this purpose. It would still present the problems with an inadequate pool to form a statistically significant sample and it would still require the reporting of a margin of error. On top of that there all of the students who opt out. Of course the real problem with the MSP and the HSPE is the slow turnaround in the results and the unknown impact of summer learning (or loss).
So, in short, I'm not saying that I wouldn't love an objective measure of student growth as a measure of teacher effectiveness, but it would have to be better than the suggestions I have seen to date.
The critiques of using student test data to assess teacher effectiveness are well known. So where is there a response from the proponents of this idea to the legitimate concerns and objections? I don't see it.
Comments
Is this true?
I agree with Charlie's critique, except he doesn't mention the financial interest of NWEA and Darth Goodloe in opening a new and lucrative market for the MAP tests.
That's all a lot of us see.
A naked grab for public education dollars.
Prove I'm wrong, Dr. Goodloe.
Sign a lifetime no-financial-interest statement.
Does the SEA proposal still use A test - just not MAP? Did they say they might use MSP or HSPE? I hope the union isn't giving in on this...
Children grow up: what they learn, how they learn, when they learn, and the value they themselves give to that learning is singularly unique. We too often delude ourselves to think that any particular factor of growth is a simple cause-effect.
Do we want to offer a rich opportunity for learning at school? Yes. Do we want to assess changes in growth over time? Yes.
Do we want to have prescriptive programs and quantifiable measurements to evaluate teacher effectiveness and predict adult success? I hope not.
Read Yong Zhao's Catching up or Leading the Way. The perennial heavyweights on Int'l Science & Math Assessments, China, Singapore, Hong Kong, et al are
dissasociating themselves from the prescriptive and quantifiable. Why? Because their factories of producing testing wonks hasn't equated into successful careers in the global marketplace.
ken berry
Finally, the MAP scores say nothing about grade level progress. One of the strengths of the MAP test is that it is independent of grade-level so it can be used to track students from year to year.
I disagree with the use of the MSP as a more effective measurement for teacher evaluation for many reasons that you indicate - it ignores all student that are above "standard" and so would reward teachers for ignoring all but the below standard students. It also does not give feedback in a timely manner.
Given this, I am still waiting to hear about a way of evaluating student learning that is more effective than the MAP. If it doesn't exist, then one has to decide either 1) that we will not evaluate teachers based on student learning at all or 2) that we will use the best (albeit highly flawed) MAP test until something else comes along. Which one would you like? I am honestly torn.
dissasociating themselves from the prescriptive and quantifiable. Why? Because their factories of producing testing wonks hasn't equated into successful careers in the global marketplace.
ken berry"
Yes, and Professor Zhao (of Michigan State, I believe) also believes that not only are the "The perennial heavyweights on Int'l Science & Math Assessments" moving away from the quantitative drill and kill because the students of this aren't competitive, but that they're not competitive because the creativity in them is squelched, squashed, run over by "data." These powerhouses mentioned are seeing that places like, oh, the USA? were (and still are, one hopes) turning out innovative and unique students (and therefore products), and these powerhouses are actively seeking to create pedagogy that supports creativity.
Let's say you have the perfect objective test. It measures the effectiveness of teachers and it is flawless. Central headquarters is happy because they have the objective measure. Teachers are happy because the more effective teachers are paid more.
Now, you have three second grade teachers in a building. A good one, an ok teacher and a new comer.
Which students get the good one? Which settle for the Ok teacher? Who must take the risk on the new comer?
We all know who the best teacher is. We even have the objective data to back things up.
What's the answer?
It's called accountability and it's brought about by observation and transparency. When each of these "players" (should we add parent/guardians and students? oh what the heck, lets!) is doing their job, they are held acountable by publishing, demonstrating, modeling, and actual pursuance of their duties.
THIS is acountabiloity and evaluation. How does this look? There are many ways to structure it, as have been discussed here and elsewhere. Of course it requires that each do their job of holding others accountable to doing THEIR jobs, or at least trusting that someone, somewhere, is holding others accountable.
The one hitch? Students will learn a variety of things in a variety of ways. There will be variance, as each student NEEDS variance. Each educators (choose from list above) IS variable. The learning that results IS variable. THANK GOD!
So the learning will be sometimes confusing to outsiders (heck, to the students themselves....they're such bundles of ever-changing energy and hormones and changes and all that that they themselves are often confused by what they've learned. They're also confused by their world, and the world ahead of them.
Will that world be variable or invariable? Will we model quantity and measurement or quality and amophousness, ever-changing and everchangeable reality?
Humans love to measure. Think about it. EVERYTHING is measured. But we can't measure what a person knows, what they'll know in ten minutes, or what a person may do the very next instant.
(Word Verifier says that the above theory is but one example of an evoli.)
Thus, my bottom line assessment is the most pure-hearted supporters (and here, I'm excluding all venal motives) believe that a statistical effect will improve learning, that small benefits will accrue from practices that might get rid of 20% of the bad teachers even at the expense of 80% of the good ones. They think that will result in a net "teacher effectiveness" gain of small amounts (1-2%), and assume that it will have no other effects, and think the experiment worthwhile. I'd be willing to consider the experiment if, I believed that they would actually listen to the results. There's a clear prediction that there should be an improvement in overall student performance (by their own measures). If it failed, would they give up the methodology?
The evidence from the experiments with to vouchers/charters suggests not. And that's why I'm so vehemently opposed to doing the experiment here. If we tried it for a year, and there was no performance increase, could we revert? Come up with a serious and measurable performance based measure of the success of the performance evaluation system, and an automatic procedure for reverting away from it if it does not produce positive gains, and you might get me to consider it.
I see what you mean, but isn't it actually the case that the proposed system uses data from those 30 students as a sample to give us an expected value for how well that teacher will teach their next 30, 60, 90... students? Their data is being used to evaluate the ongoing quality of the teacher, not really to quantify how well that teacher taught those particular students. Otherwise what's the point? The teachers who did a "bad" job with those 30 students will be given a "bad" evaluation though we won't really know how well they will do with the next 30, 60, 90 students until after they take the test.
An excellent question. Who gets the good teacher?
Here in Seattle, where families fight for perceived incremental differences in quality, this could become a contentious issue.
Not so everywhere. Tennessee, which has been using value-added data to evaluate teachers for some time now, was curious about that very question. Their answer was to only take care that students didn't get a "bad" teacher two or three years in a row. Seriously. I read the study after being referred to it by the LEV blog.
How can anyone countenance such a thing? Probably by knowing that the differences in "teacher quality" account for only incremental differences in cumulative student achievement.
By the way, Pete, it looks like the all-time career hit leader will have the all-time career strike out leader and the all-time career home run leader keeping him company outside Cooperstown, looking in.
In Greek-influenced Western culture we know things by counting and measuring. If we don't measure student learning on some linear scale then we can't believe that it is happening. And we are desperate for assurances that it is happening.
Other cultures may accept other proofs - proofs that we would likely reject.
I will be delighted to see them.
In fact, random assignment to classroom is one of the "assumptions" of various VAM models that I've had the time to try to read thru. Even Chiang and Shocket, who reported Type I and Type II error rates over 25% when 3 years of teacher data are used to assess "effectiveness," assumed that children would be randomly assigned to teachers within their schools. They state in their paper that their error rates are likely an underestimate given that children are not randomly assigned to teachers.
The insanity of this proposal just grows and grows every time I think about it.
Baseball is maticulously measured. What happened? Rampant cheating.
Will it be any different with the teachers and administrators when the tests become so important?
I especially don't understand your argument that "[if we]choose to not use student learning in evaluating teaching....we then can't be surprised when students come out of SPS with less learning even though there might be great instruction."
You'll need to clarify. If we have "good teachers," (whatever qualities we expect and observe for) then hopefully students will learn. Why would we think there would be LESS learning?! I don't get it.
For example, their test scores are only estimates of knowlege, not an exact measure of knowledge, unlike your scale in your example which can provide an exact measure of your weight. Each MAP assessment uses questions pulled from a large database of questions, and each score is going to be affected by which particular questions came up that day. Sometimes just by bad luck, a child will get some really weird questions, but had they gotten different questions, it might have appeared that they know more. Maybe a change in score just represents the sample of questions that came up that day rather than an increase or decrease in knowledge.
When scores are averaged, some of this variability is accounted for. In the Chiang and Shocket paper that I mentioned, they analyzed three years' worth of teacher data and there was still too much variability in the VAM results for my liking. They suggested that you would need as many as 10 years data to get a reliable estimate of a teacher's effectiveness. SPS is proposing using only 2 years' data in their assessments. Even if you believe the VAM is the way to go, the SPS approach will mis-label many effective teachers as ineffective, as well as fail to identify many ineffective teachers, because they won't be using enough data for each individual teacher.
I finally read that LA Times article, not sure I got to every piece of it. Was struck by how one teacher who seemed engaging and earnest and all that, yet did so poorly on the VAM provided, surprising her and her principal.
But.. my son has had two awful teachers that others I have met agree uniformly that they were bad. But he also had some teachers that other parents really liked, but I felt were mediocre and definitely *not* facilitating progress in learning. Teachers that I thought were facilitating learning in good ways? Most other parents I spoke with agreed, but not always. Sometimes that is hard to judge and with anecdotal evidence, people will disagree.
When I looked at my own child's MAP scores this past year, because I am a geeky data type, I looked not only at the RIT score itself but whether the ranges overlapped from one test to the next. Because her spring score range and fall score range did not overlap, I concluded that she had demonstrated a statistically significant improvement in score (now, the clinical or "real world" significance of that conclusion remains to be determined, but that's another issue).
However, now that I know that that is only a 68% CI, I am less certain. Maybe after dinner, I'll see if I can back-calculate to determine a more useful range, like a 95% CI. Fascinating!
Good assessments of student learning ultimately require wisdom, experience and expertise. The same is true of the evaluations of supervisors.
I personally don't have a problem with the stakeholder survey in SERVE--that is, on one condition. If part of my evaluation is to be based on the feedback of parents, students and peers, then the superintendent's evaluation should be partly based on the feedback of teachers, administrators and parents. Everyone accountable should mean everyone
A sample size of 30 does begin to approximate the normal curve (as part of the central tendency theorem). For samples smaller than 30, there are measurements made to help account for the small sample size. (Remember your "degrees of freedom" from your stats class in college?)
With that in mind, we can get pretty accurate information about a teacher after 30 students. And we'd definitely be able to get a sense about a teacher over time as they would then have 60, 90, 120.
The issue for me is deciding what to use as a norm by which to compare Teacher X. Within the school? Within a geographic area? Within the District?
It'd be helpful to know what (if they were measure if now) the differences are within a school as compared to between schools. That measure would help narrow down how to best find a standard norm. And with that, there could be several sample norms per district.
There are also some arguments about children who can't do well based on outside factors. I call B-to-the-S on that one. Edtrust is a great organization - with plenty of data to back it up - that has proven this conjecture. It is a possible that a student might not do well because of family circumstances, but it is unlikely that an entire class would.
We need to stop blaming families for children not performing well in school. There are oodles of examples across the country (see Edtrust) that are 90/90/90 schools.
We need to find the bad teachers and get rid of them. We need to find the bad principles and get rid of them.
/rant
you "thought [I] was advocating for evaluation of teaching because evaluating student learning was too difficult."
Um, no, not difficult, but impossible. We can GUESS at what a student learned because of a particular instruction, but we'll never know a) ALL a student learned given that instruction; b) whether the evidence of learning was "tainted" by some problem with the testing (question design, student distraction, etc) or c) what, exactly, produced the learning that a student acquired.
Say a teacher teaches, um, paragraphing, and immediately provides the best test ever created to test knowledge of paragraphing. "best test ever" might account for variability in b) above, test conditions, but it doesn't account for a)what else was learned besides what was tested for, and c) where the knowledge demonstrated on the test was learned. It might APPEAR that the knowledge was directly acquired, given the lesson right before the test, but what if the student already knew how to paragraph but didn't answer correctly on the pre-test because they just didn't feel like it?
Additionally, in real-world application of a test of, say, Reading over a year, a student exposed to many sources of knowledge besides their regular teacher. As the scenario I posted elsewhere indicates, a student's Reading score could be the result of LA teacher, Reading teacher, History teacher, mentor, self-teaching....who knows?
http://www.edtrust.org/dc/about/funders
http://www.edtrust.org/dc/about/board-of-directors
See their agenda:
http://www.edtrust.org/issues/our-advocacy-agenda - straight out of the 'deform' handbook...
"It is a possible that a student might not do well because of family circumstances, but it is unlikely that an entire class would."
To say that one variable is not a factor in the scores of all students is not to say that it isn't a significant factor in the scores of an undetermined number of students. As students tend to be grouped in honors and non-honors classes in high schools, the probability of compounding factors affecting students in non-honors classes increases. Or at least that's my experience.
On the instructor side of it, there are compounding factors. Some teachers have three classes to prepare for; some have one. Some teachers have more support and resources to draw on than other teachers.
you are also talking about using a computerized test which in its self is a skill. My kindergartners took this test three times this year, the results showed me who could use a computer, but not what they learned throughout the year (three kids took the test in the SPRING without sound). There are a plethora of things that I want to teach my kids and how to point and click is nowhere near the top of the list. I can tell that you love data and so do I, but I don't want to waste time pouring through data that isn't giving me valuable information about my students. There are some things the MAP assessment can do for some teachers (our 8th grade teacher found the information helpful), but judging how well I teach is not one of them! The NWEA is clear on its website that it is not a test that will asses kids on if they are meeting grade level exceptions. They also say it is not a test that will show teacher effectiveness. If our own superintendent is on the executive board of the company why is the SPS district trying to use it in the presented ways? I would be happy to have my evaluation based on my teaching and how I help my students progress but the MAP test is NOT going to do that.
http://www.seattleschools.org/area/laborrelations/index.dxml
Who's blaming parents? There are all sorts of parents, many with their own issues and problems...Those that are "absent" or ineffective I personally don't "blame" - could be any number of factors.
What many here are saying is that there are factors outside the control of the many educators involved in the Reading Score of a given child. No blame, just reality.
Not incidentally, it is a common tactic of "reform" rhetoric to accuse people that don't agree with the reformers of "blaming the parents" or "blaming the student." This is simply not true, and is an insult. I think it would be safe to say that many, if not most educators, good or bad, care very much about their students, and also understand the various external circumstances. They just can't do anything about those circumstances (or little: many teachers, in their own time and taking on the unpaid role of social worker, DO reach out to parent/guardians, DO work with community resources to address the needs of students.
Please refrain from using that term, it's offensive.
http://seattletimes.nwsource.com/html/opinion/2012704319_guest24rice.html
Please forgive me, but I am compelled to point out a couple of things:
Mr. Mas: Nolan Ryan is the career leader in strikeouts, with 5,714 and is in the Hall of Fame. Roger Clemens is 3rd with 4,672. Randy Johnson is 2nd, by the way.
Now on to more important things. All of you are using confidence interval wrong. A confidence interval is how confident you are that true population is included in your interval, which is taken from sample data. If you say a 68% confidence interval that means that you are 68% confident that the true population mean is included in the interval. 68% is a very, very low confidence interval. In all my experience (undergrad education, grad education, teaching AP Stats), the lowest confidence interval I have seen is a 90% confidence interval. That would mean that you are 90% confident that the rue population mean is included in the interval. The most common interval is a 95% confidence interval. Now the more confident you are, the wider your interval has to be. That is the trade off. It drives my students crazy.
The 68% you are talking about is part of what is known as the Empirical Rule for an approximately Normal distribution. This states that 68% of the data is plus or minus one standard deviation from the mean, 95% of the data is plus or minus two standard deviations from the mean and 99.7% of the data is plus or minus three standard deviations from the mean.
So, while your child's MAP score maybe within one standard deviation of the mean, that is NOT a 68% confidence interval.
Now I know you are taking this off of some report you got from the district. If someone would be willing to point to where that report is, I can read it and then help everyone understand what is trying to be said.
Its enough for me that the federal DOE published a paper (links previously posted on other threads) that found that value-added teacher performance methods had an error rate of at least 25% ...
And another study (again, links previously posted on another thread), comparing the various testing products on the market, found that the NWEA/MAP product was useless in giving teachers any data about individual children... so if you cant use the data to inform your instruction for the children in your class, what's the point???
And then you're going to have your performance evaluated on the results of a flawed testing methodology and be paid based on those flawed results, fired maybe based on those flawed results?
Who in their right mind would agree to that?
Who in their right mind would think that was fair and appropriate and seek to impose that on highly educated and credentialled professionals?
Well, let's see what they're really agreeing to. Oh yeah. A voluntary potential reward... equivalent in value to 1 latte per day. Who cares?
I know this sort of link has been bandied about, perhaps multiple times. But I found this WSJ blog bit from their Numbers Guy a great summary. Look at the comments as well.
Someone, (ebayer?) was wondering whether or not we should use flawed measures as better than nothing. One person quoted in the WSJ piece:
“Even with multiple years of data, there are a whole lot of false positives and negatives,” said Barnett Berry, president and chief executive of the Center for Teaching Quality. “If we start using these value-added metrics inappropriately, what could have been a very powerful tool could become the baby that gets thrown out with the bathwater.”
And Dorothy - I wasnt criticising the time spend on discussing the statistical foundations of testing etc... I was saying I personally dont need to spend time on it because its completely apparent to me that the whole idea is flawed in its basic assumptions and doesnt warrant any attempt to justify it...
http://www.youtube.com/user/auntyBROAD#p/u/14/RuB_3au6q5M
http://www.youtube.com/watch?v=l4w6o52vqys&feature=channel
The report that comes home gives a RIT score and its range. The explanation reads "The middle number is the RIT score your child received. The numbers on either side of the RIT score define the score range. If retested, your child would score within the range most of the time."
You can see how one would think that this is therefore some sort of confidence interval. We are not talking about the average RIT score for the class, where we might be interested in the standard deviation and trying to define the range in which most other children score.
So, how are folks supposed to interpret the range that is being reported for any given child? What am I missing? I really am curious since no one at our school could tell us much about how to interpret the information they gave us.
Pick a freakin' name and your comments will not be deleted.
When the WASL was used to determine if schools met AYP the OSPI calculated a margin of error. Sometimes these margiins of error were quite big - as much as 50 percentage points. Then the school was given full credit for the entire margin of error before their pass rate was compared to the target required for AYP.
Will the teachers get full credit for the margin of error on their students' tests? I think not.
Rhee open to releasing value-added scores
Unlike Los Angeles, the District has started to use value-added data, making it 50 percent of the annual IMPACT evaluation for some teachers. Last month, 76 educators were dismissed for poor overall evaluation scores, 26 of them in grades where value added data was used. It is not known whether the value-added piece was decisive in any of the cases.
......Rhee told The Times that releasing value-added statistics could also confuse parents and create logistical problems for administrators. What she didn't say is that there might also be privacy issues, since the data is part of a personnel evaluation. Despite the potential obstacles, Rhee said disclosure of the scores could empower parents to demand better instruction for their children.
===========
"empower parents"
OK folks now I am confused. You mean in WA DC Parents can actually demand something and it makes a difference.
Think about Seattle and math programs ... parent demands do not mean anything. In a whole variety of other things SPS parent wishes are routinely ignored by the Board and the Superintendent.
Perhaps it is the "empowerment of parents" that is the value Seattle needs to add.
Guess it is OK to demand better instruction but ..... do not demand better instructional materials or better instructional practices .... WA DC uses Everyday Math.
First, for math, is the test as statistically reliable as other tests that have been available for years. There is, or has been, an ACT Algebra 1 and Algebra 2 test that is norm referenced and reliable. It assesses specific skills and is reliable. I don't know that to be true of the MAP test. Too many tests try to test skills and creativity. The test should test the student's skills, and I've seen multiple choice items written to test pretty complex mathematics skills.
Point Two: reasonable effort. On the most recent MAP test, I had good students who simply blew off the final MAP test. These were students who typically did well on skills and problem solving tests in class, and who initially scored well on the MAP, but whose final MAP scores fell, and when the time to finish the test was looked at, it was obvious that the student just raced through the test and put no effort into it. People who provide the test will tell you that the test doesn't allow students to do that. They are wrong, I've seen it done. Add to the mix students who have put in little effort throughout the year for whatever reason. I have a notable percentage of students who are frequently absent and/or do little if any of the required work (aka, homework). I contact parents, or attempt to contact them, and try to get students to come in for extra help often to no avail. A school or school system cannot reasonably hold a teacher accountable for things beyond his/her control. The motto that the Superintendent has brought with her, "Everyone Achieving, Everyone Accountable" is hard on the "Everyone Achieving" part for the student. But apparently, the "Everyone Accountable" part is meant only for staff, not students. There has to be a way to correct or allow for students who are not performing and whose non-performance has been addressed by the teacher.
My final issue with the testing is the aspect of conflict of interest. The school district must be totally disinterested in the provider of the test, and we know that not to be the case. The head of the district administration is on the board of the company that provides the MAP. The Superintendent's role is an obvious conflict of interest, regardless of her receiving any financial benefit. The board seriously underestimates the potential for opening SPS to legal threat if they allow the person they hire to have any interest whatsoever in the company that SPS pays to provide the test. Parents should be outraged that the board of directors is so willing to turn a blind eye to the Superintendent's involvement with NWEA when NWEA's test is being used to assess student achievement. As a parent in this district, I'm happy to work to oust elected officials who disregard their fiduciary responsibility.
So if SPS wants to use test scores as a measure of teacher effectiveness, fine, I'll step up, but it must be a test that is recognized as being fair and accurate, there must be a way to correct for low performance beyond the teacher's control, and it must be a test provided by an organization or company in which the district has no vested interest, financial or otherwise. If these criteria are met, then I'll be glad to say "Bring It!"
Gee Ed. If students don't pass the MSP, they don't graduate. Isn't that a pretty drastic form of accountability for the student? What's your accountability?
Right Ed. If calling the student's parent isn't motivating, you'll have to think of something else. Nobody's saying it's easy, but that's the job.
Secondly, you state that "People who provide the test will tell you that the test doesn't allow students to do that" where that is blowing off the test. I have never seen such a statement from NWEA and can't imagine they would since that is a problem with ALL evaluation tools. That is nothing specific to the MAP.
The third point is spot-on. There is a conflict of interest, and it makes SPS look bad. That does not mean that the test is not the best one out there (or that it isn't) it just means that people should justifiably be suspicious, and that MGJ should either 1) resign from the NWEA board or 2) make sure she has no input into the decision making around the testing and that the decision is transparent and defensible. Unfortunately she has not done either.
Gee Ed. If students don't pass the MSP, they don't graduate. Isn't that a pretty drastic form of accountability for the student? What's your accountability?
Gee, reader. Where did you get this from? Students are in no way held accountable for their scores on the MSP or the MAP. The test they must pass is the state-mandated HSPE (high school proficiency exam). You've just lost a lot credibility here.
Lucky for us, the WWC has done it HERE.
Note:
WWC Quick Review of the Report "Middle School Mathematics Professional Development Impact Study: Findings After the First Year of Implementation"
Student-level math achievement was measured by a computer-adaptive rational number test developed by the Northwest Evaluation Association. Teacher-level topical knowledge was measured by a rational number test created by the study’s authors. Teachers’ instructional practices were measured by classroom observations.
The study measured the effects of professional development by comparing outcomes at the end of the academic year in schools that were offered professional development provided by the study with outcomes in schools that did not.
================
Findings:
The study found that students in schools where teachers were offered extensive professional development by the study performed no better on a test of math achievement in rational numbers than students in comparison schools at the end of the 2007–08 academic year.
Further, the study found the professional development had no impact on teacher knowledge of rational number topics and on how to teach them.
============
Now try this one.
Thinking Strategically Across ARRA Funds: Teacher Professional Development .....
Note:
While the implementation of well-designed PD interventions does not appear common, effective interventions are being implemented. For example, two interventions were recently evaluated by the U.S. Department of Education (ED) and were shown to be effective. One intervention targeted elementary reading instruction; the other targeted middle school math. ED’s evaluations revealed that the interventions did produce significant gains in teacher awareness and use of targeted instructional techniques but did not produce significant gains in student achievement.
{[effective interventions are being implemented .. they just do NOT produce gains in student achievement ..... holy multiple choices batman ... I hope this won't determine my merit pay]}
Based on the evaluations of these two interventions, it appears that it may be easier to achieve sustained improvement in teacher knowledge and instruction than in student learning.
-----
The obvious question seems to be if improvement in teacher knowledge and instruction results from ProDev but does NOT improve student learning, are teachers learning the right stuff in the ProDev?
We have a sad situation when the Ed Elites provide the ProDev to influence how to teach; but when the teachers learn it and apply it => No improvement in student learning happens......
Guess we all need another large Kool-ade....
Seems like underlying the story is it is impossible to fix irredeemable crap programs like many of those in the SPS. Think about all the Pro Dev that accompanied Everyday Math and the huge increase in instructional time.... that crap just can't be fixed.
It is not the teachers that are sinking public education. It is the upper levels in the SPS that are unable to intelligently analyze much of anything that has to do with instructional materials and practices.
Student fails = no graduation.
Teacher fails = lifetime job.
Who is more accountable?
Is there some reason you deleted my response to Reader and Ebaer? Some of your participants might be interested to engage in the discourse. The response was not off topic, nor abusive. Reader is uninformed as to the various tests and their uses, and others may not know what actually happens in the MAP test as opposed to what NWEA says happens. Moderator and censor are two different roles.
Maybe we should hold back non-MSP passers...there's some accountability, wot?
Strangely, a parent can opt out of the HSPE, their student can continue through HS taking all AP classes and get a 4.0 and "graduate" in credit yet not have the state Certificate of Mastery.
Odd, eh? What ever happened to local control of schools?
Oh, wait, it's only students and teachers.
So students could do their best on the curriculum, teachers could teach their best on curriculum, yet results (even if we assume the tests are "good"...which I don't....) would not show relevant progress or lack of progress! Oh well! Test 'em all anyway, fail and fire at will. "It's about the kids"...and our ability to arbitrarily number and manipulate them!
Can you please tell me what I am doing wrong in trying to post my response. I post it, it appears, then disappears. I have no idea why my posts are not "sticking."
EdGuerilla
"Reader" is off the mark with his/her comment. I was speaking of MAP, not MSP. So let's limit the discussion to the correct Acronym. The proposal has been made to link teacher evaluations with MAP scores, not HSPE scores. MAP or Measures of Academic Progress, is the computer based test taken three times during the year. The first administration of the test gives a base-line score, and as the year progresses, the presumption is if the scores increase, the student has made academic progress. So if a student starts out with a low MAP score for math, but improves by year's end, s/he has made academic progress. Note that the student could fall in a low percentile and still show progress. For example, if a student starts in the 30th percentile, and goes up to the 40th percentile, the student has shown academic growth, even though s/he is likely not performing at grade level. I might be misremembering, but we were told the average increase in scores over a year was 4 points. So if a student raised his/her score by 4 points or more, s/he made academic progress.
MAP comes from NWEA, Northwest Evaluation Association, a private company, specifically a NPO,
or Non Profit Organization, which doesn't mean free, but they're private. MSP, or Measures of Student Progress is an acronym used by the WA State OSPI and is determined by a students HSPE, or High School Proficiency Exam, previously the WASL, or Washington Assessment of Student Learning. Both tests come from OSPI, or the Office of the Superintendent of Public Instruction. Washington State is PUBLIC, not free, but not PRIVATE. The proposal has NOT been made to link teacher evaluation to the HSPE, and that is another discussion entirely. Please stay on topic.
I think there is a pretty big difference between a group of folks that see the MAP test as a starting point to be used and improved on a group that sees the MAP test as too flawed a test to use and the third group that thinks there is no way to have a test that would be effective enough and so student learning should never be used in teacher evaluation.
What we ought to be doing is the following:
1. Roll out the four-tier evaluation model that the teachers and the district worked on.
2. Buckle down on principals in terms of meaningful evaluations -- and this needs to go both ways -- the staff should be evaluating the principal with respect to the timing, relevance, feedback, and follow up they are getting from evaluations, and the Ed Directors downtown need to be responsible for, and ready to, counsel their principals on how to effectively evaluate, mentor, and support their staff.
THEN -- let's see where that gets us in terms of either improving teacher performance or getting ineffective teachers out of classrooms.
IF, at the same time, they want to run a several-year pilot program that also gives students entry and exit tests, and compare those results with the ones that they are getting from principal observations, parent/student feedback, etc -- great. And -- maybe use 2 or 3 different assessments -- MAP, I don't know what else is out there. It would be very interesting to see where correlations exist and -- if they don't -- what it means. Are the observation/evaluation methods flawed? Are the testing models flawed? Is there just a huge margin of error? No one knows this stuff at present.
“Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.”
“We can't solve problems by using the same kind of thinking we used when we created them.”
“Intellectual growth should commence at birth and cease only at death”
“Logic will get you from A to B. Imagination will take you everywhere.”
“Imagination is everything. It is the preview of life's coming attractions.”
“The true sign of intelligence is not knowledge but imagination.”
and finally:
“Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods.” ...
So tell me again, why do we put kids through this farce we call 'education', in this particular form, (which kills rather than fosters and rewards imagination), why do we put so much faith in 'testing' (which punishes rather than praises imagination) and why are we giving our money away to already uber rich vulture philanthropists wanting to control our world and our kids - their future workers and consumers - in any way they can?
I think we're all stupid, being led like lambs to the slaughter....
Tests give you only one piece of factual information: How well the student performed on the given test. Using a test score to make inferences about quality of instruction may or may not be valid. A teacher can be a master at his/her art and have students who do not do well on an assessment such as MAP. Similarly, you can have a student who is motivated and does well on such an assessment, but s/he has an ineffective teacher. Both of these scenarios happen more frequently than you might think, not simply anecdotally. If you are going to use a test as one means of teacher evaluation, you must correct for anomalies as much as statistically possible.
If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control. If a student is intentionally putting in no effort; that is, s/he simply tears through the test, or purposefully answers incorrectly, then that student's score is not a valid measure of teacher effectiveness. If a student is often absent, or has long term truancy issues and is not in class to receive instruction, a test score showing no growth would not be a valid measure of teacher effectiveness. If a student puts in little or no effort throughout the year, and the teacher has addressed the non-performance through various interventions (parent contact, conferencing, et. al.), a low test score does not say anything about teacher effectiveness. I see each of the preceding student-types every year. You cannot always attribute low performance to teacher effectiveness.
For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most of teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.
There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.
The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.
Tests give you only one piece of factual information: How well the student performed on the given test. Using a test score to make inferences about quality of instruction may or may not be valid. A teacher can be a master at his/her art and have students who do not do well on an assessment such as MAP. Similarly, you can have a student who is motivated and does well on such an assessment, but s/he has an ineffective teacher. Both of these scenarios happen more frequently than you might think, not simply anecdotally. If you are going to use a test as one means of teacher evaluation, you must correct for anomalies as much as statistically possible.
If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control. If a student is intentionally putting in no effort; that is, s/he simply tears through the test, or purposefully answers incorrectly, then that student's score is not a valid measure of teacher effectiveness. If a student is often absent, or has long term truancy issues and is not in class to receive instruction, a test score showing no growth would not be a valid measure of teacher effectiveness. If a student puts in little or no effort throughout the year, and the teacher has addressed the non-performance through various interventions (parent contact, conferencing, et. al.), a low test score does not say anything about teacher effectiveness. I see each of the preceding student-types every year. You cannot always attribute low performance to teacher effectiveness.
For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.
There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.
The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.
If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control; that is, students trying not to do well on a test, students whose low performance is due to lack of proficiency in english, or students who make little or no effort to improve throughout the year. In cases such as the preceding, a low test score does not say anything about teacher effectiveness. You cannot always attribute low performance to teacher effectiveness.
For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.
There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.
The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.
Yes, there are lots of reasons why it might not be in a teacher's control how much students learn, but those are the students, and if you can't teach them, get out of the way and let someone else have a shot at it. I have students that are not prepared and don't want to be in class. If I can't inspire them and I can't help them get up to speed, then I shouldn't be teaching them. My desire for a job does not trump the primary reason for my school - to educate students.
You claim there is an effective way of removing ineffective teachers. However, the numbers I have been quoted are very small (fewer than 10 in the district in a year). Do you believe that this number is true, and if so is that the actual number of ineffective teachers in SPS? I would like to hear the data that supports your assertion since I am not as informed about this part of the SEA contract.
BTW, I agree with you on the ethics - it seems like a conflict of interest that would be in violation of policy and the law. I believe all public sector employees need to avoid both actual conflicts of interest and the appearance of a conflict of interest.