Thoughts about using student tests to judge teachers

August 23, 2010

Let me begin by saying that I would absolutely love to be able to use some objective measure of student academic growth as a measure of teacher effectiveness. I sincerely would. I am open to hearing about such a measure that can isolate the growth of student learning that is attributable to the teacher. I'm dying to hear about it. I just haven't heard it yet.

I won't pretend that the proposed measure is the student's cumulative acheivement. I will grant folks who want to use student test results as a measure of teacher effectiveness the sophistication that they are using a measure of change. I'll even grant them an additional level of sophistication that acknowledges the smaller potential for growth among students who are already near the top of the measure. The five point change from 45 to 50 is easier to coach than the five point change from 85 to 90. Just the same, the data and the conclusions have grave faults.

I review data for a living, so I have some familiarity - if not expertise - with how it can be misused. I have very little confidence in the data that I have seen that links student outcomes to teacher effectiveness. It makes a number of unsupportable assumptions. The primary one is to attribute all of the change in student outcomes to the teacher. That's crazy. The second one is to presume that all classes are statistically equivalent. That's also crazy. The third is to base the conclusions on a sample too small to be statistically significant (such as 30 students). The data is suspiciously reported without a margin of error. I find that odd and hubristically certain. Where is the standard deviation of outcomes? I haven't seen any reported. I'm always suspicious that unreported data is contradictory to the conclusion of the author. In this case I wonder if the range of outcomes isn't so great as to make the conclusion unsupportable, that the range of outcomes dwarfs the differences in outcomes.

Let's add some questions about the student assessment used. In Seattle the proposed measure is the MAP. The MAP was not intended for this purpose and is a poor choice. First is the primary function of MAP as a formative assessment. The MAP was designed to spark questions, not to provide answers. Second is the general unreliability of the MAP. We see student scores hop about, in part I suppose because the students can manipulate the experience. Finally we have the constraint that the MAP measures students along one dimension only, grade level progress.

When my eldest daughter first tested into Spectrum I had to ask about twelve people before I got a cogent description of the program. The one I finally got - and it is one that was confirmed by the program manager - was that Spectrum was about going beyond the standard curriculum in three ways: farther (to the next grade level expectations), deeper (a deeper understanding of the concepts), and broader (an understanding of the concepts in a wider variety of contexts). MAP only measures one of those dimensions, the one that I think I care least about: farther. Also, given the District's focus on vertical articulation and curricular alignment, teachers are actively discouraged from teaching students along this dimension. The District leadership actively discourages teachers from providing the students with the support that would really boost their MAP scores - instruction in the next grade level's curriculum.

Unfortunately the District has no other norm-referenced assessment of student academic achievement. The MSP and HSPE (former WASL) is criterion-referenced and therefore absolutely inappropriate for this purpose - unless the assessment is exclusively about getting under-performing students to Standard. Even then, it is inappropriate to use MSP scores for ranking so the only change that can be recorded is changes in Levels (1,2, 3, or 4) for students. This presumes, inappropriately, no progress for a student who did not change levels and loss for students who go down a level. Still, I would be more comfortable with the District using this gross tool as a measure since this test was actually designed for this purpose. It would still present the problems with an inadequate pool to form a statistically significant sample and it would still require the reporting of a margin of error. On top of that there all of the students who opt out. Of course the real problem with the MSP and the HSPE is the slow turnaround in the results and the unknown impact of summer learning (or loss).

So, in short, I'm not saying that I wouldn't love an objective measure of student growth as a measure of teacher effectiveness, but it would have to be better than the suggestions I have seen to date.

The critiques of using student test data to assess teacher effectiveness are well known. So where is there a response from the proponents of this idea to the legitimate concerns and objections? I don't see it.

Comments

ParentofThree said…

My understanding at this point is that the district is backing off using MAP as the teacher assessment - only bringing in MAP results to help evaluate struggling teachers?

Is this true?

8/23/10, 12:58 PM

Eric M said…

No, that was an SEA proposal. Frankly, it still legitimizes the MAP test.

I agree with Charlie's critique, except he doesn't mention the financial interest of NWEA and Darth Goodloe in opening a new and lucrative market for the MAP tests.

That's all a lot of us see.

A naked grab for public education dollars.

Prove I'm wrong, Dr. Goodloe.
Sign a lifetime no-financial-interest statement.

8/23/10, 1:54 PM

Anonymous said…

This comment has been removed by a blog administrator.

8/23/10, 2:17 PM

karyn king said…

Eric,
Does the SEA proposal still use A test - just not MAP? Did they say they might use MSP or HSPE? I hope the union isn't giving in on this...

8/23/10, 2:27 PM

Anonymous said…

Why is there such a fascination with dissecting "student growth" into all the integral components and bombarding these static components with X-ray crystallography to determine the contributive value of each?

Children grow up: what they learn, how they learn, when they learn, and the value they themselves give to that learning is singularly unique. We too often delude ourselves to think that any particular factor of growth is a simple cause-effect.

Do we want to offer a rich opportunity for learning at school? Yes. Do we want to assess changes in growth over time? Yes.

Do we want to have prescriptive programs and quantifiable measurements to evaluate teacher effectiveness and predict adult success? I hope not.

Read Yong Zhao's Catching up or Leading the Way. The perennial heavyweights on Int'l Science & Math Assessments, China, Singapore, Hong Kong, et al are
dissasociating themselves from the prescriptive and quantifiable. Why? Because their factories of producing testing wonks hasn't equated into successful careers in the global marketplace.
ken berry

8/23/10, 2:30 PM

ebaer said…

While I generally agree with your comments, there are some factual inaccuracies. First of all, student scores are reported with errors. It gives a range. I believe this is based on NWEA having students take the test multiple times. Secondly, you would be right about N=30 being typically too small of a sample, except that they are measuring all students in a class - this is not a case of sampling. (In the same way that when I read my weight this morning there is not sampling error; it would only be if I used that to calculate the average weight of people in Seattle that we would need to worry about sampling errors).
Finally, the MAP scores say nothing about grade level progress. One of the strengths of the MAP test is that it is independent of grade-level so it can be used to track students from year to year.
I disagree with the use of the MSP as a more effective measurement for teacher evaluation for many reasons that you indicate - it ignores all student that are above "standard" and so would reward teachers for ignoring all but the below standard students. It also does not give feedback in a timely manner.
Given this, I am still waiting to hear about a way of evaluating student learning that is more effective than the MAP. If it doesn't exist, then one has to decide either 1) that we will not evaluate teachers based on student learning at all or 2) that we will use the best (albeit highly flawed) MAP test until something else comes along. Which one would you like? I am honestly torn.

8/23/10, 2:34 PM

seattle citizen said…

"Read Yong Zhao's Catching up or Leading the Way. The perennial heavyweights on Int'l Science & Math Assessments, China, Singapore, Hong Kong, et al are
dissasociating themselves from the prescriptive and quantifiable. Why? Because their factories of producing testing wonks hasn't equated into successful careers in the global marketplace.
ken berry"

Yes, and Professor Zhao (of Michigan State, I believe) also believes that not only are the "The perennial heavyweights on Int'l Science & Math Assessments" moving away from the quantitative drill and kill because the students of this aren't competitive, but that they're not competitive because the creativity in them is squelched, squashed, run over by "data." These powerhouses mentioned are seeing that places like, oh, the USA? were (and still are, one hopes) turning out innovative and unique students (and therefore products), and these powerhouses are actively seeking to create pedagogy that supports creativity.

8/23/10, 2:39 PM

Pete Rose said…

Riddle me this Batman:

Let's say you have the perfect objective test. It measures the effectiveness of teachers and it is flawless. Central headquarters is happy because they have the objective measure. Teachers are happy because the more effective teachers are paid more.

Now, you have three second grade teachers in a building. A good one, an ok teacher and a new comer.

Which students get the good one? Which settle for the Ok teacher? Who must take the risk on the new comer?

We all know who the best teacher is. We even have the objective data to back things up.

What's the answer?

8/23/10, 2:40 PM

seattle citizen said…

ebaer, it is my contention that student learning is very, very difficult to measure accurately (for a variety of reasons stated by me and others, often, over the last year or two of this debate), But it is possible to evaluate TEACHING: how well are educators teaching (IAs, Teachers, Counselors, Librarians, Specialists, Classifieds, Admins, Cluster Executives, Ed Directors, Superintendents, Boards, and citizens who "own" the district and sit over the board)?

It's called accountability and it's brought about by observation and transparency. When each of these "players" (should we add parent/guardians and students? oh what the heck, lets!) is doing their job, they are held acountable by publishing, demonstrating, modeling, and actual pursuance of their duties.

THIS is acountabiloity and evaluation. How does this look? There are many ways to structure it, as have been discussed here and elsewhere. Of course it requires that each do their job of holding others accountable to doing THEIR jobs, or at least trusting that someone, somewhere, is holding others accountable.

The one hitch? Students will learn a variety of things in a variety of ways. There will be variance, as each student NEEDS variance. Each educators (choose from list above) IS variable. The learning that results IS variable. THANK GOD!

So the learning will be sometimes confusing to outsiders (heck, to the students themselves....they're such bundles of ever-changing energy and hormones and changes and all that that they themselves are often confused by what they've learned. They're also confused by their world, and the world ahead of them.

Will that world be variable or invariable? Will we model quantity and measurement or quality and amophousness, ever-changing and everchangeable reality?

Humans love to measure. Think about it. EVERYTHING is measured. But we can't measure what a person knows, what they'll know in ten minutes, or what a person may do the very next instant.

(Word Verifier says that the above theory is but one example of an evoli.)

8/23/10, 2:55 PM

zb said…

I also like data, and am familiar with analyzing large data sets with correlation methods that are common in these quantitative assessments of teacher performance. The math I see suggests that at best (and even this seems a bit of a stretch) these assessments could be used as screening tests (to refer to diagnostic terminology). They seem inadequate for diagnostic purposes (i.e. actually making a decision based on the performance scores).

Thus, my bottom line assessment is the most pure-hearted supporters (and here, I'm excluding all venal motives) believe that a statistical effect will improve learning, that small benefits will accrue from practices that might get rid of 20% of the bad teachers even at the expense of 80% of the good ones. They think that will result in a net "teacher effectiveness" gain of small amounts (1-2%), and assume that it will have no other effects, and think the experiment worthwhile. I'd be willing to consider the experiment if, I believed that they would actually listen to the results. There's a clear prediction that there should be an improvement in overall student performance (by their own measures). If it failed, would they give up the methodology?

The evidence from the experiments with to vouchers/charters suggests not. And that's why I'm so vehemently opposed to doing the experiment here. If we tried it for a year, and there was no performance increase, could we revert? Come up with a serious and measurable performance based measure of the success of the performance evaluation system, and an automatic procedure for reverting away from it if it does not produce positive gains, and you might get me to consider it.

8/23/10, 3:04 PM

Maureen said…

ebaer says they are measuring all students in a class - this is not a case of sampling.

I see what you mean, but isn't it actually the case that the proposed system uses data from those 30 students as a sample to give us an expected value for how well that teacher will teach their next 30, 60, 90... students? Their data is being used to evaluate the ongoing quality of the teacher, not really to quantify how well that teacher taught those particular students. Otherwise what's the point? The teachers who did a "bad" job with those 30 students will be given a "bad" evaluation though we won't really know how well they will do with the next 30, 60, 90 students until after they take the test.

8/23/10, 3:28 PM

Arnold said…

My 8 year old took the MAP test. Teachers were baffled because winter results fell way below fall MAP result. MAP results were inconsistent classroom assignments. My child works 2 years above grade level. I asked my daughter about her MAP experience -she replied "I had to go to the bathroom". She retested and scores fell even more. Spring MAP results took another dive. Again, MAP results are inconsistent with classwork. I asked my child about her experience with MAP, she told me "The computer makes my eyes dizzy.". Perhaps this is the reason she finished the MAP exam at record breaking speed. We are dealing with 5, 6, 7 and 8 year old children. Can't measure all the variables.

8/23/10, 3:40 PM

Charlie Mas said…

Charlie Hustle,

An excellent question. Who gets the good teacher?

Here in Seattle, where families fight for perceived incremental differences in quality, this could become a contentious issue.

Not so everywhere. Tennessee, which has been using value-added data to evaluate teachers for some time now, was curious about that very question. Their answer was to only take care that students didn't get a "bad" teacher two or three years in a row. Seriously. I read the study after being referred to it by the LEV blog.

How can anyone countenance such a thing? Probably by knowing that the differences in "teacher quality" account for only incremental differences in cumulative student achievement.

By the way, Pete, it looks like the all-time career hit leader will have the all-time career strike out leader and the all-time career home run leader keeping him company outside Cooperstown, looking in.

8/23/10, 3:50 PM

Charlie Mas said…

As to the big question that ken berry asks, why do we subject student learning, a dynamic, non-linear, holistic and amorphous bean bag to measurement by calipers, I can say that it is a reflection on our culture's epistemology - how we know what we know.

In Greek-influenced Western culture we know things by counting and measuring. If we don't measure student learning on some linear scale then we can't believe that it is happening. And we are desperate for assurances that it is happening.

Other cultures may accept other proofs - proofs that we would likely reject.

8/23/10, 3:54 PM

Charlie Mas said…

I have yet to hear any margin of error or standard deviation mentioned in the context of MAP scores or any other student achievement test scores used to assay teacher effectiveness.

I will be delighted to see them.

8/23/10, 3:58 PM

Lori said…

randomization, Pete Rose! And I'm not joking. The only way to assign children in this scenario is through randomization so that there is no conscious or unconscious selection bias on the part of the principal who makes the assignments.

In fact, random assignment to classroom is one of the "assumptions" of various VAM models that I've had the time to try to read thru. Even Chiang and Shocket, who reported Type I and Type II error rates over 25% when 3 years of teacher data are used to assess "effectiveness," assumed that children would be randomly assigned to teachers within their schools. They state in their paper that their error rates are likely an underestimate given that children are not randomly assigned to teachers.

The insanity of this proposal just grows and grows every time I think about it.

8/23/10, 4:05 PM

Pete Rose said…

Charlie,

Baseball is maticulously measured. What happened? Rampant cheating.

Will it be any different with the teachers and administrators when the tests become so important?

8/23/10, 4:06 PM

ebaer said…

So, Seattle Citizen you choose to not use student learning in evaluating teaching. I see your point - it is a good one. However, we then can't be surprised when students come out of SPS with less learning even though there might be great instruction. I think (but I am not sure) that I would prefer a school system that had more student learning even if that meant less emphasis on terrific teaching. As an example, several studies have shown recess time to be positively correlated to increased learning. Assuming this is true, I would prefer kids to have more recess time even though to an observer it doesn't look like effective teaching and therefore in your system the teacher might not be evaluated as highly. Thank you for your thought-provoking comments.

8/23/10, 4:11 PM

ebaer said…

Charlie - the teacher report and the report that I was given as a parent both give a score and a range of scores that represents a 68% confidence interval - one standard deviation on a normal distribution. You can see an example of an annotated teacher's report at http://www.nwea.org/sites/www.nwea.org/files/resources/Annotated_Reports_MAP.pdf

8/23/10, 4:25 PM

seattle citizen said…

ebaer, You'll have to explain your comment to me. I'm not sure I understand it.

I especially don't understand your argument that "[if we]choose to not use student learning in evaluating teaching....we then can't be surprised when students come out of SPS with less learning even though there might be great instruction."

You'll need to clarify. If we have "good teachers," (whatever qualities we expect and observe for) then hopefully students will learn. Why would we think there would be LESS learning?! I don't get it.

8/23/10, 4:26 PM

Lori said…

ebaer, while you are correct that the 30 or so students are only being evaluated against themselves and not being used to try to estimate some average growth rate of a larger population (at least as I understand it), there are still sampling issues.

For example, their test scores are only estimates of knowlege, not an exact measure of knowledge, unlike your scale in your example which can provide an exact measure of your weight. Each MAP assessment uses questions pulled from a large database of questions, and each score is going to be affected by which particular questions came up that day. Sometimes just by bad luck, a child will get some really weird questions, but had they gotten different questions, it might have appeared that they know more. Maybe a change in score just represents the sample of questions that came up that day rather than an increase or decrease in knowledge.

When scores are averaged, some of this variability is accounted for. In the Chiang and Shocket paper that I mentioned, they analyzed three years' worth of teacher data and there was still too much variability in the VAM results for my liking. They suggested that you would need as many as 10 years data to get a reliable estimate of a teacher's effectiveness. SPS is proposing using only 2 years' data in their assessments. Even if you believe the VAM is the way to go, the SPS approach will mis-label many effective teachers as ineffective, as well as fail to identify many ineffective teachers, because they won't be using enough data for each individual teacher.

8/23/10, 4:45 PM

Dorothy Neville said…

I am not against using student learning as one piece of evaluating teachers. The question is, of course, how? MAP and HSPE just seem so obviously fundamentally flawed for this use.

I finally read that LA Times article, not sure I got to every piece of it. Was struck by how one teacher who seemed engaging and earnest and all that, yet did so poorly on the VAM provided, surprising her and her principal.

But.. my son has had two awful teachers that others I have met agree uniformly that they were bad. But he also had some teachers that other parents really liked, but I felt were mediocre and definitely *not* facilitating progress in learning. Teachers that I thought were facilitating learning in good ways? Most other parents I spoke with agreed, but not always. Sometimes that is hard to judge and with anecdotal evidence, people will disagree.

8/23/10, 4:50 PM

Lori said…

the MAP range is a 68% CI? I suspected that it was a confidence interval given how it's defined on the one-page printout that got as a parent, but I never anticipated that level. Is that typical in social sciences research? I had really expected it to be a 90-95% CI.

When I looked at my own child's MAP scores this past year, because I am a geeky data type, I looked not only at the RIT score itself but whether the ranges overlapped from one test to the next. Because her spring score range and fall score range did not overlap, I concluded that she had demonstrated a statistically significant improvement in score (now, the clinical or "real world" significance of that conclusion remains to be determined, but that's another issue).

However, now that I know that that is only a 68% CI, I am less certain. Maybe after dinner, I'll see if I can back-calculate to determine a more useful range, like a 95% CI. Fascinating!

8/23/10, 4:57 PM

ebaer said…

Seattle Citizen - apparently I misunderstood your comment, and if so I apologize. I thought you were advocating for evaluation of teaching because evaluating student learning was too difficult - am I misunderstanding your post at 2:55 pm? You correctly point out (an many others here have too) that good teaching and learning do not always go hand in hand - there are too many external factors. It is my opinion you measure the outcome that you want - if you want good teaching, measure that. If you want good learning, measure that. And of course that brings up Loris excellent point - that the MAP test doesn't do a good job of measuring student learning; but that just goes back to the first unanswered question - do we measure learning with the flawed instrument we have until some better way comes along or do we not measure learning at all? While it seems I am defending the MAP test here, I actually am not sure what my person answer would be. Lori is right - we might (probably will) lose some very effective teachers by accident, but if we don't do any learning assessment than I would expect we will keep ineffective teachers as we are currently doing.

8/23/10, 5:12 PM

ebaer said…

Lori- If you want to get a 95% CI just double the range (assuming a normal distribution). To find out if your child's 2 scores are significantly different, you'll have to do a little more than that - beyond my memory of my statistics class.

8/23/10, 5:22 PM

Mr. Edelman said…

There are a lot of ways to assess student learning in the classroom--everything from computer-based assessments of basic skills to performance-based assessments of group presentations. No assessment is foolproof. All admit of uncertainty. In the drive for certainty, some people seem to want standardized test scores because they eliminate the biased observations and judgments of individual human beings. That is a chimera.

Good assessments of student learning ultimately require wisdom, experience and expertise. The same is true of the evaluations of supervisors.

I personally don't have a problem with the stakeholder survey in SERVE--that is, on one condition. If part of my evaluation is to be based on the feedback of parents, students and peers, then the superintendent's evaluation should be partly based on the feedback of teachers, administrators and parents. Everyone accountable should mean everyone

8/23/10, 5:36 PM

Anonymous said…

Just for the hell of it, here's a mini Stats 101 for people:

A sample size of 30 does begin to approximate the normal curve (as part of the central tendency theorem). For samples smaller than 30, there are measurements made to help account for the small sample size. (Remember your "degrees of freedom" from your stats class in college?)

With that in mind, we can get pretty accurate information about a teacher after 30 students. And we'd definitely be able to get a sense about a teacher over time as they would then have 60, 90, 120.

The issue for me is deciding what to use as a norm by which to compare Teacher X. Within the school? Within a geographic area? Within the District?

It'd be helpful to know what (if they were measure if now) the differences are within a school as compared to between schools. That measure would help narrow down how to best find a standard norm. And with that, there could be several sample norms per district.

There are also some arguments about children who can't do well based on outside factors. I call B-to-the-S on that one. Edtrust is a great organization - with plenty of data to back it up - that has proven this conjecture. It is a possible that a student might not do well because of family circumstances, but it is unlikely that an entire class would.

We need to stop blaming families for children not performing well in school. There are oodles of examples across the country (see Edtrust) that are 90/90/90 schools.

We need to find the bad teachers and get rid of them. We need to find the bad principles and get rid of them.

/rant

8/23/10, 5:36 PM

seattle citizen said…

ebaer,
you "thought [I] was advocating for evaluation of teaching because evaluating student learning was too difficult."

Um, no, not difficult, but impossible. We can GUESS at what a student learned because of a particular instruction, but we'll never know a) ALL a student learned given that instruction; b) whether the evidence of learning was "tainted" by some problem with the testing (question design, student distraction, etc) or c) what, exactly, produced the learning that a student acquired.

Say a teacher teaches, um, paragraphing, and immediately provides the best test ever created to test knowledge of paragraphing. "best test ever" might account for variability in b) above, test conditions, but it doesn't account for a)what else was learned besides what was tested for, and c) where the knowledge demonstrated on the test was learned. It might APPEAR that the knowledge was directly acquired, given the lesson right before the test, but what if the student already knew how to paragraph but didn't answer correctly on the pre-test because they just didn't feel like it?

Additionally, in real-world application of a test of, say, Reading over a year, a student exposed to many sources of knowledge besides their regular teacher. As the scenario I posted elsewhere indicates, a student's Reading score could be the result of LA teacher, Reading teacher, History teacher, mentor, self-teaching....who knows?

8/23/10, 5:56 PM

Sahila said…

EDTRUST: another 'deform' org funded by Broad and Gates et al, with a Gates person on the Board

http://www.edtrust.org/dc/about/funders

http://www.edtrust.org/dc/about/board-of-directors

See their agenda:
http://www.edtrust.org/issues/our-advocacy-agenda - straight out of the 'deform' handbook...

8/23/10, 6:11 PM

Mr. Edelman said…

/ rant,

"It is a possible that a student might not do well because of family circumstances, but it is unlikely that an entire class would."

To say that one variable is not a factor in the scores of all students is not to say that it isn't a significant factor in the scores of an undetermined number of students. As students tend to be grouped in honors and non-honors classes in high schools, the probability of compounding factors affecting students in non-honors classes increases. Or at least that's my experience.

On the instructor side of it, there are compounding factors. Some teachers have three classes to prepare for; some have one. Some teachers have more support and resources to draw on than other teachers.

8/23/10, 6:38 PM

Jennifer said…

Ebaer,

you are also talking about using a computerized test which in its self is a skill. My kindergartners took this test three times this year, the results showed me who could use a computer, but not what they learned throughout the year (three kids took the test in the SPRING without sound). There are a plethora of things that I want to teach my kids and how to point and click is nowhere near the top of the list. I can tell that you love data and so do I, but I don't want to waste time pouring through data that isn't giving me valuable information about my students. There are some things the MAP assessment can do for some teachers (our 8th grade teacher found the information helpful), but judging how well I teach is not one of them! The NWEA is clear on its website that it is not a test that will asses kids on if they are meeting grade level exceptions. They also say it is not a test that will show teacher effectiveness. If our own superintendent is on the executive board of the company why is the SPS district trying to use it in the presented ways? I would be happy to have my evaluation based on my teaching and how I help my students progress but the MAP test is NOT going to do that.

8/23/10, 6:40 PM

another mom said…

Again, I don't know where to put this but the District has put out another spin piece.

http://www.seattleschools.org/area/laborrelations/index.dxml

8/23/10, 7:03 PM

seattle citizen said…

anonymous 8/23 5:36

Who's blaming parents? There are all sorts of parents, many with their own issues and problems...Those that are "absent" or ineffective I personally don't "blame" - could be any number of factors.

What many here are saying is that there are factors outside the control of the many educators involved in the Reading Score of a given child. No blame, just reality.

Not incidentally, it is a common tactic of "reform" rhetoric to accuse people that don't agree with the reformers of "blaming the parents" or "blaming the student." This is simply not true, and is an insult. I think it would be safe to say that many, if not most educators, good or bad, care very much about their students, and also understand the various external circumstances. They just can't do anything about those circumstances (or little: many teachers, in their own time and taking on the unpaid role of social worker, DO reach out to parent/guardians, DO work with community resources to address the needs of students.

Please refrain from using that term, it's offensive.

8/23/10, 7:10 PM

Sahila said…

Seattle Times is at it again... this time with Norm Rice spouting the deform party line...

http://seattletimes.nwsource.com/html/opinion/2012704319_guest24rice.html

8/23/10, 7:42 PM

Michael Rice said…

Hello
Please forgive me, but I am compelled to point out a couple of things:

Mr. Mas: Nolan Ryan is the career leader in strikeouts, with 5,714 and is in the Hall of Fame. Roger Clemens is 3rd with 4,672. Randy Johnson is 2nd, by the way.

Now on to more important things. All of you are using confidence interval wrong. A confidence interval is how confident you are that true population is included in your interval, which is taken from sample data. If you say a 68% confidence interval that means that you are 68% confident that the true population mean is included in the interval. 68% is a very, very low confidence interval. In all my experience (undergrad education, grad education, teaching AP Stats), the lowest confidence interval I have seen is a 90% confidence interval. That would mean that you are 90% confident that the rue population mean is included in the interval. The most common interval is a 95% confidence interval. Now the more confident you are, the wider your interval has to be. That is the trade off. It drives my students crazy.

The 68% you are talking about is part of what is known as the Empirical Rule for an approximately Normal distribution. This states that 68% of the data is plus or minus one standard deviation from the mean, 95% of the data is plus or minus two standard deviations from the mean and 99.7% of the data is plus or minus three standard deviations from the mean.

So, while your child's MAP score maybe within one standard deviation of the mean, that is NOT a 68% confidence interval.

Now I know you are taking this off of some report you got from the district. If someone would be willing to point to where that report is, I can read it and then help everyone understand what is trying to be said.

8/23/10, 7:52 PM

Sahila said…

I dont need an explanation of statistical methodology/terms etc...

Its enough for me that the federal DOE published a paper (links previously posted on other threads) that found that value-added teacher performance methods had an error rate of at least 25% ...

And another study (again, links previously posted on another thread), comparing the various testing products on the market, found that the NWEA/MAP product was useless in giving teachers any data about individual children... so if you cant use the data to inform your instruction for the children in your class, what's the point???

And then you're going to have your performance evaluated on the results of a flawed testing methodology and be paid based on those flawed results, fired maybe based on those flawed results?

Who in their right mind would agree to that?

Who in their right mind would think that was fair and appropriate and seek to impose that on highly educated and credentialled professionals?

8/23/10, 8:18 PM

reader said…

Who in their right mind would agree to that?

Well, let's see what they're really agreeing to. Oh yeah. A voluntary potential reward... equivalent in value to 1 latte per day. Who cares?

8/23/10, 8:35 PM

Dorothy Neville said…

Some may not need or desire more statistical chat, while I have been fascinated and enlightened. Thanks to all. And if I am going to have coherent talking points to those I meet who are less informed and might at first glance think evaluating teachers is a great idea, then this is the sort of nuanced information I want to have to share. (I guess my circle of acquaintances tends towards the number nerds.)

I know this sort of link has been bandied about, perhaps multiple times. But I found this WSJ blog bit from their Numbers Guy a great summary. Look at the comments as well.

Someone, (ebayer?) was wondering whether or not we should use flawed measures as better than nothing. One person quoted in the WSJ piece:

“Even with multiple years of data, there are a whole lot of false positives and negatives,” said Barnett Berry, president and chief executive of the Center for Teaching Quality. “If we start using these value-added metrics inappropriately, what could have been a very powerful tool could become the baby that gets thrown out with the bathwater.”

8/23/10, 8:43 PM

Dorothy Neville said…

Reader, I question your conclusion! What's the average teacher salary? Maybe $60K? So 1% is $600 and even without taxes being considered, that's not enough for a latte every day.

8/23/10, 8:47 PM

Sahila said…

It isnt a voluntary potential reward for new teachers coming into the system - its a requirement, a way of phasing it in over time...

And Dorothy - I wasnt criticising the time spend on discussing the statistical foundations of testing etc... I was saying I personally dont need to spend time on it because its completely apparent to me that the whole idea is flawed in its basic assumptions and doesnt warrant any attempt to justify it...

8/23/10, 8:47 PM

Sahila said…

here are the videos again....

http://www.youtube.com/user/auntyBROAD#p/u/14/RuB_3au6q5M

http://www.youtube.com/watch?v=l4w6o52vqys&feature=channel

8/23/10, 8:51 PM

Lori said…

Michael, I understand and appreciate your post about CIs. What I'm still trying to understand, however, is what is being reported to parents as MAP results and how to interpret them.

The report that comes home gives a RIT score and its range. The explanation reads "The middle number is the RIT score your child received. The numbers on either side of the RIT score define the score range. If retested, your child would score within the range most of the time."

You can see how one would think that this is therefore some sort of confidence interval. We are not talking about the average RIT score for the class, where we might be interested in the standard deviation and trying to define the range in which most other children score.

So, how are folks supposed to interpret the range that is being reported for any given child? What am I missing? I really am curious since no one at our school could tell us much about how to interpret the information they gave us.

8/23/10, 9:27 PM

Anonymous said…

This comment has been removed by a blog administrator.

8/23/10, 9:28 PM

Anonymous said…

This comment has been removed by a blog administrator.

8/23/10, 11:54 PM

Charlie Mas said…

Anonymous, your comments were deleted because they are anonymous.

Pick a freakin' name and your comments will not be deleted.

When the WASL was used to determine if schools met AYP the OSPI calculated a margin of error. Sometimes these margiins of error were quite big - as much as 50 percentage points. Then the school was given full credit for the entire margin of error before their pass rate was compared to the target required for AYP.

Will the teachers get full credit for the margin of error on their students' tests? I think not.

8/24/10, 2:41 AM

dan dempsey said…

Washington Post:
Rhee open to releasing value-added scores

Unlike Los Angeles, the District has started to use value-added data, making it 50 percent of the annual IMPACT evaluation for some teachers. Last month, 76 educators were dismissed for poor overall evaluation scores, 26 of them in grades where value added data was used. It is not known whether the value-added piece was decisive in any of the cases.

......Rhee told The Times that releasing value-added statistics could also confuse parents and create logistical problems for administrators. What she didn't say is that there might also be privacy issues, since the data is part of a personnel evaluation. Despite the potential obstacles, Rhee said disclosure of the scores could empower parents to demand better instruction for their children.

===========

"empower parents"

OK folks now I am confused. You mean in WA DC Parents can actually demand something and it makes a difference.

Think about Seattle and math programs ... parent demands do not mean anything. In a whole variety of other things SPS parent wishes are routinely ignored by the Board and the Superintendent.

Perhaps it is the "empowerment of parents" that is the value Seattle needs to add.

8/24/10, 7:49 AM

dan dempsey said…

"empower parents to demand better instruction for their children."

Guess it is OK to demand better instruction but ..... do not demand better instructional materials or better instructional practices .... WA DC uses Everyday Math.

8/24/10, 7:52 AM

Melissa Westbrook said…

Anonymous, for the last time, we have a policy of no anonymous comments out of clarity and courtesy for all. There is NO conspiracy theory just simple logistics. Pick a handle.

8/24/10, 8:08 AM

Ed said…

I don't mind having my students' progress looked at to evaluate my performance. My problem with the testing used is three-fold:

First, for math, is the test as statistically reliable as other tests that have been available for years. There is, or has been, an ACT Algebra 1 and Algebra 2 test that is norm referenced and reliable. It assesses specific skills and is reliable. I don't know that to be true of the MAP test. Too many tests try to test skills and creativity. The test should test the student's skills, and I've seen multiple choice items written to test pretty complex mathematics skills.

Point Two: reasonable effort. On the most recent MAP test, I had good students who simply blew off the final MAP test. These were students who typically did well on skills and problem solving tests in class, and who initially scored well on the MAP, but whose final MAP scores fell, and when the time to finish the test was looked at, it was obvious that the student just raced through the test and put no effort into it. People who provide the test will tell you that the test doesn't allow students to do that. They are wrong, I've seen it done. Add to the mix students who have put in little effort throughout the year for whatever reason. I have a notable percentage of students who are frequently absent and/or do little if any of the required work (aka, homework). I contact parents, or attempt to contact them, and try to get students to come in for extra help often to no avail. A school or school system cannot reasonably hold a teacher accountable for things beyond his/her control. The motto that the Superintendent has brought with her, "Everyone Achieving, Everyone Accountable" is hard on the "Everyone Achieving" part for the student. But apparently, the "Everyone Accountable" part is meant only for staff, not students. There has to be a way to correct or allow for students who are not performing and whose non-performance has been addressed by the teacher.

My final issue with the testing is the aspect of conflict of interest. The school district must be totally disinterested in the provider of the test, and we know that not to be the case. The head of the district administration is on the board of the company that provides the MAP. The Superintendent's role is an obvious conflict of interest, regardless of her receiving any financial benefit. The board seriously underestimates the potential for opening SPS to legal threat if they allow the person they hire to have any interest whatsoever in the company that SPS pays to provide the test. Parents should be outraged that the board of directors is so willing to turn a blind eye to the Superintendent's involvement with NWEA when NWEA's test is being used to assess student achievement. As a parent in this district, I'm happy to work to oust elected officials who disregard their fiduciary responsibility.

So if SPS wants to use test scores as a measure of teacher effectiveness, fine, I'll step up, but it must be a test that is recognized as being fair and accurate, there must be a way to correct for low performance beyond the teacher's control, and it must be a test provided by an organization or company in which the district has no vested interest, financial or otherwise. If these criteria are met, then I'll be glad to say "Bring It!"

8/24/10, 11:02 AM

reader said…

"Everyone Achieving, Everyone Accountable" is hard on the "Everyone Achieving" part for the student. But apparently, the "Everyone Accountable" part is meant only for staff, not students.

Gee Ed. If students don't pass the MSP, they don't graduate. Isn't that a pretty drastic form of accountability for the student? What's your accountability?

Right Ed. If calling the student's parent isn't motivating, you'll have to think of something else. Nobody's saying it's easy, but that's the job.

8/24/10, 11:55 AM

ebaer said…

Ed-you are wrong on the first two points. The MAP test has been shown to be norm referenced and reliable. Look on NWEAs website for the documents. That is one of the big advantages of a test that is so widely used. Is it effective at measuring what we want to measure? Almost assuredly not. Whether it provides useful information on what we want to measure is the debate that is going on here (one of them).

Secondly, you state that "People who provide the test will tell you that the test doesn't allow students to do that" where that is blowing off the test. I have never seen such a statement from NWEA and can't imagine they would since that is a problem with ALL evaluation tools. That is nothing specific to the MAP.

The third point is spot-on. There is a conflict of interest, and it makes SPS look bad. That does not mean that the test is not the best one out there (or that it isn't) it just means that people should justifiably be suspicious, and that MGJ should either 1) resign from the NWEA board or 2) make sure she has no input into the decision making around the testing and that the decision is transparent and defensible. Unfortunately she has not done either.

8/24/10, 12:01 PM

A Mom said…

If you read NWEA's information on the MAP, you'd find some reference to the individual test being invalidated if the student completes it too soon. So technically, they have something in place to identify (though not prevent) the most obvious form of dismissing the test.

8/24/10, 12:49 PM

karyn king said…

@reader
Gee Ed. If students don't pass the MSP, they don't graduate. Isn't that a pretty drastic form of accountability for the student? What's your accountability?

Gee, reader. Where did you get this from? Students are in no way held accountable for their scores on the MSP or the MAP. The test they must pass is the state-mandated HSPE (high school proficiency exam). You've just lost a lot credibility here.

8/24/10, 1:02 PM

dan dempsey said…

OK .. How about NWEA testing to determine the value of District Sponsored Professional Development?

Lucky for us, the WWC has done it HERE.

Note:
WWC Quick Review of the Report "Middle School Mathematics Professional Development Impact Study: Findings After the First Year of Implementation"

Student-level math achievement was measured by a computer-adaptive rational number test developed by the Northwest Evaluation Association. Teacher-level topical knowledge was measured by a rational number test created by the study’s authors. Teachers’ instructional practices were measured by classroom observations.

The study measured the effects of professional development by comparing outcomes at the end of the academic year in schools that were offered professional development provided by the study with outcomes in schools that did not.
================

Findings:
The study found that students in schools where teachers were offered extensive professional development by the study performed no better on a test of math achievement in rational numbers than students in comparison schools at the end of the 2007–08 academic year.

Further, the study found the professional development had no impact on teacher knowledge of rational number topics and on how to teach them.
============

Now try this one.

Thinking Strategically Across ARRA Funds: Teacher Professional Development .....

Note:
While the implementation of well-designed PD interventions does not appear common, effective interventions are being implemented. For example, two interventions were recently evaluated by the U.S. Department of Education (ED) and were shown to be effective. One intervention targeted elementary reading instruction; the other targeted middle school math. ED’s evaluations revealed that the interventions did produce significant gains in teacher awareness and use of targeted instructional techniques but did not produce significant gains in student achievement.

{[effective interventions are being implemented .. they just do NOT produce gains in student achievement ..... holy multiple choices batman ... I hope this won't determine my merit pay]}

Based on the evaluations of these two interventions, it appears that it may be easier to achieve sustained improvement in teacher knowledge and instruction than in student learning.

-----

The obvious question seems to be if improvement in teacher knowledge and instruction results from ProDev but does NOT improve student learning, are teachers learning the right stuff in the ProDev?

We have a sad situation when the Ed Elites provide the ProDev to influence how to teach; but when the teachers learn it and apply it => No improvement in student learning happens......

Guess we all need another large Kool-ade....

Seems like underlying the story is it is impossible to fix irredeemable crap programs like many of those in the SPS. Think about all the Pro Dev that accompanied Everyday Math and the huge increase in instructional time.... that crap just can't be fixed.

It is not the teachers that are sinking public education. It is the upper levels in the SPS that are unable to intelligently analyze much of anything that has to do with instructional materials and practices.

8/24/10, 1:33 PM

reader said…

Ok Karyn. You got me. The students did have to pass the WASL. So now, they've got to pass the son of WASL in some flavor... the MSP being the thing most students take which will prepare them for test taking in high school. Isn't that accountability?

Student fails = no graduation.
Teacher fails = lifetime job.

Who is more accountable?

8/24/10, 2:22 PM

Ed said…

Moderator:
Is there some reason you deleted my response to Reader and Ebaer? Some of your participants might be interested to engage in the discourse. The response was not off topic, nor abusive. Reader is uninformed as to the various tests and their uses, and others may not know what actually happens in the MAP test as opposed to what NWEA says happens. Moderator and censor are two different roles.

8/24/10, 2:51 PM

Melissa Westbrook said…

Ed, did it have your name (or any name on it)? I didn't delete any that had a name.

8/24/10, 2:57 PM

seattle citizen said…

Reader, just to clarify, the MSP has absolutely nothing to do with graduation. The test students are required to get a Certificate of Mastery is the HSPE. THE MSP is the K-8 state test, and students can move right on up through 12th grade having not passed one MSP or HSPE.

Maybe we should hold back non-MSP passers...there's some accountability, wot?

Strangely, a parent can opt out of the HSPE, their student can continue through HS taking all AP classes and get a 4.0 and "graduate" in credit yet not have the state Certificate of Mastery.

Odd, eh? What ever happened to local control of schools?

8/24/10, 6:11 PM

seattle citizen said…

Or maybe we should make the MAP test the graduation test. At least that way students, teachers, admins, and board are all accountable for the same test results...

Oh, wait, it's only students and teachers.

8/24/10, 6:12 PM

seattle citizen said…

and oh, wait, math MAP doesn't seem to correlate too well with the Math curriculum assigned by District....and the jury is still out whether new standards and alignment in LA correlate with either MAP or HSPE.

So students could do their best on the curriculum, teachers could teach their best on curriculum, yet results (even if we assume the tests are "good"...which I don't....) would not show relevant progress or lack of progress! Oh well! Test 'em all anyway, fail and fire at will. "It's about the kids"...and our ability to arbitrarily number and manipulate them!

8/24/10, 7:38 PM

Ed said…

Moderator:
Can you please tell me what I am doing wrong in trying to post my response. I post it, it appears, then disappears. I have no idea why my posts are not "sticking."

EdGuerilla

8/24/10, 8:14 PM

Ed said…

When I wrote "People who provide the test will tell you that the test doesn't allow students to do that," I overstated, I didn't mean the documentation on their web site, but we were told that the program is adaptive; that is, if students are answering correctly, the difficulty of the questions increases. If students are responding incorrectly, the difficulty of the questions decreases. If students are simply checking random answers and moving too quickly, the program would not allow them to keep doing that. What I found was students were able to blow off the test and just finish as opposed to perform.

"Reader" is off the mark with his/her comment. I was speaking of MAP, not MSP. So let's limit the discussion to the correct Acronym. The proposal has been made to link teacher evaluations with MAP scores, not HSPE scores. MAP or Measures of Academic Progress, is the computer based test taken three times during the year. The first administration of the test gives a base-line score, and as the year progresses, the presumption is if the scores increase, the student has made academic progress. So if a student starts out with a low MAP score for math, but improves by year's end, s/he has made academic progress. Note that the student could fall in a low percentile and still show progress. For example, if a student starts in the 30th percentile, and goes up to the 40th percentile, the student has shown academic growth, even though s/he is likely not performing at grade level. I might be misremembering, but we were told the average increase in scores over a year was 4 points. So if a student raised his/her score by 4 points or more, s/he made academic progress.

MAP comes from NWEA, Northwest Evaluation Association, a private company, specifically a NPO,
or Non Profit Organization, which doesn't mean free, but they're private. MSP, or Measures of Student Progress is an acronym used by the WA State OSPI and is determined by a students HSPE, or High School Proficiency Exam, previously the WASL, or Washington Assessment of Student Learning. Both tests come from OSPI, or the Office of the Superintendent of Public Instruction. Washington State is PUBLIC, not free, but not PRIVATE. The proposal has NOT been made to link teacher evaluation to the HSPE, and that is another discussion entirely. Please stay on topic.

8/24/10, 8:42 PM

reader said…

Ed, all I'm saying is that students ARE accountable. Something you denied. Unfortunately, they are the only ones accountable. The district never said "Everyone Accountable... by the same test measure." did they?

8/24/10, 9:20 PM

Ed said…

Okay, Reader, I'll try this one more time. Students are not required to achieve any particular score on the MAP test. They can be stellar students in class, for a grade they wish to achieve, and blow off the test (and I've SEEN this happen), and there is no consequence. However, the teacher of those same students could be scrutinized unfairly for the test score, regardless of how much the students actually learned under the guidance of the teacher. The students ARE NOT accountable for earning a particular, minimum score on the MAP test. I'm not sure what you aren't getting about this.

8/24/10, 9:45 PM

seattle citizen said…

I think (correct me if I'm wrong, Reader) that Reader means students are accountable because if they don't pass HSPE they won't graduate?

8/25/10, 6:38 AM

ebaer said…

Ed - you are right. But I am not sure why that matters to you - can you help me understand? The lack of student accountability is true of any test (with the one exception of the HSPE). This is true of any test that would measure student learning. SO does that mean that you think this is a fatal flaw? That would put you in the camp with Seattle Citizen that there is no way a test could ever be devised that would measure student learning effectively enough to use in teacher evaluations. Is that true of your point? Or am I missing something?
I think there is a pretty big difference between a group of folks that see the MAP test as a starting point to be used and improved on a group that sees the MAP test as too flawed a test to use and the third group that thinks there is no way to have a test that would be effective enough and so student learning should never be used in teacher evaluation.

8/25/10, 9:21 AM

Jan said…

ebaer: I don't know what Ed meant, but I think I am in the third group, at least for now -- unless you make every test a high stakes test for the kids (which would be educationally unsound), you really can't know how many of the kids just decided to "punt" on the test because it was stupid and irrelevant, or who really just don't give a rip about school or grades. I didn't believe this really happened much, until an 11th grader from Garfield assured me that the "effort" put in on the science HSPE is orders of magnitude different from what gets put in on reading/writing/math. I suspect that, if you have enough years data, there are correlations that can be made -- but they have NEVER been figured out, and are by no means easy to figure out.
What we ought to be doing is the following:
1. Roll out the four-tier evaluation model that the teachers and the district worked on.
2. Buckle down on principals in terms of meaningful evaluations -- and this needs to go both ways -- the staff should be evaluating the principal with respect to the timing, relevance, feedback, and follow up they are getting from evaluations, and the Ed Directors downtown need to be responsible for, and ready to, counsel their principals on how to effectively evaluate, mentor, and support their staff.
THEN -- let's see where that gets us in terms of either improving teacher performance or getting ineffective teachers out of classrooms.
IF, at the same time, they want to run a several-year pilot program that also gives students entry and exit tests, and compare those results with the ones that they are getting from principal observations, parent/student feedback, etc -- great. And -- maybe use 2 or 3 different assessments -- MAP, I don't know what else is out there. It would be very interesting to see where correlations exist and -- if they don't -- what it means. Are the observation/evaluation methods flawed? Are the testing models flawed? Is there just a huge margin of error? No one knows this stuff at present.

8/25/10, 9:09 PM

Sahila said…

Quotes from Einstein (probably a better example of a well developed, well functioning human being than Bill Gates or Eli Broad or Mike Milken or Milton Friedman):

“Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.”

“We can't solve problems by using the same kind of thinking we used when we created them.”

“Intellectual growth should commence at birth and cease only at death”

“Logic will get you from A to B. Imagination will take you everywhere.”

“Imagination is everything. It is the preview of life's coming attractions.”

“The true sign of intelligence is not knowledge but imagination.”

and finally:
“Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods.” ...

So tell me again, why do we put kids through this farce we call 'education', in this particular form, (which kills rather than fosters and rewards imagination), why do we put so much faith in 'testing' (which punishes rather than praises imagination) and why are we giving our money away to already uber rich vulture philanthropists wanting to control our world and our kids - their future workers and consumers - in any way they can?

I think we're all stupid, being led like lambs to the slaughter....

8/26/10, 6:09 AM

Ed said…

The whole discussion has centered around using MAP scores as a part of teacher evaluation. No other test, or test type is germaine to my comments. Leave WASL and HSPE out of the discussion, PLEASE!

Tests give you only one piece of factual information: How well the student performed on the given test. Using a test score to make inferences about quality of instruction may or may not be valid. A teacher can be a master at his/her art and have students who do not do well on an assessment such as MAP. Similarly, you can have a student who is motivated and does well on such an assessment, but s/he has an ineffective teacher. Both of these scenarios happen more frequently than you might think, not simply anecdotally. If you are going to use a test as one means of teacher evaluation, you must correct for anomalies as much as statistically possible.

If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control. If a student is intentionally putting in no effort; that is, s/he simply tears through the test, or purposefully answers incorrectly, then that student's score is not a valid measure of teacher effectiveness. If a student is often absent, or has long term truancy issues and is not in class to receive instruction, a test score showing no growth would not be a valid measure of teacher effectiveness. If a student puts in little or no effort throughout the year, and the teacher has addressed the non-performance through various interventions (parent contact, conferencing, et. al.), a low test score does not say anything about teacher effectiveness. I see each of the preceding student-types every year. You cannot always attribute low performance to teacher effectiveness.

For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most of teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.

There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.

The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.

8/28/10, 2:15 PM

Ed said…

The whole discussion has centered around using MAP scores as a part of teacher evaluation. No other test, or test type is germaine to my comments. Leave WASL and HSPE out of the discussion, PLEASE!

Tests give you only one piece of factual information: How well the student performed on the given test. Using a test score to make inferences about quality of instruction may or may not be valid. A teacher can be a master at his/her art and have students who do not do well on an assessment such as MAP. Similarly, you can have a student who is motivated and does well on such an assessment, but s/he has an ineffective teacher. Both of these scenarios happen more frequently than you might think, not simply anecdotally. If you are going to use a test as one means of teacher evaluation, you must correct for anomalies as much as statistically possible.

If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control. If a student is intentionally putting in no effort; that is, s/he simply tears through the test, or purposefully answers incorrectly, then that student's score is not a valid measure of teacher effectiveness. If a student is often absent, or has long term truancy issues and is not in class to receive instruction, a test score showing no growth would not be a valid measure of teacher effectiveness. If a student puts in little or no effort throughout the year, and the teacher has addressed the non-performance through various interventions (parent contact, conferencing, et. al.), a low test score does not say anything about teacher effectiveness. I see each of the preceding student-types every year. You cannot always attribute low performance to teacher effectiveness.

For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.

There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.

The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.

8/31/10, 10:00 PM

Ed said…

Tests give you only one piece of factual information: How well the student performed on the given test. Using a test score to make inferences about quality of instruction may or may not be valid. I've seen motivated students with mediocre teachers do well on tests, and I've seen hard working students with excellent and effective teachers not do well on tests. Both scenarios happen frequently.

If one is to us an assessment such as MAP to measure a teacher's effectiveness, there has to be a way to correct for lack of improvement beyond the teacher's control; that is, students trying not to do well on a test, students whose low performance is due to lack of proficiency in english, or students who make little or no effort to improve throughout the year. In cases such as the preceding, a low test score does not say anything about teacher effectiveness. You cannot always attribute low performance to teacher effectiveness.

For any of the participants in this blog who are not teachers, it might seem obvious that some mitigating circumstance would explain a test score showing inadequate growth. Those of us who are teachers know it is not obvious. It takes time to correlate low scores to other data available about the student. Everyone in a school is busy, so most teachers have little confidence that an attempt to correlate low test scores to other data would be made, or if the effort is made, that the person doing it was qualified to understand the correlation. When the proposal is made to make part of our performance evaluation based on student test scores, we know how that can be done incorrectly, misused, or worse, abused.

There are teachers who should be moved out of education because they are not effective, but there is means to do that now. Effective building administrators have a process that is based on student performance and class room observation. It takes time, however, and well it should. Unless a teacher poses a threat to student safety and needs to be removed quickly, there should be clear, documented reasons for removing him or her.

The final MAJOR problem, one "Ebaer" agreed with is the conflict of interest issue. I cannot fathom why there is not more of an uproar. The Superintendent has major conflicts of interest via her involvement with outside organizations such as Broad and NWEA and the board doesn't seem to get it. This is a public district and ANY potential conflict of interest should be removed, either by the Superintendent severing her ties with the organization, or by the board severing the Superintendent from the district.

8/31/10, 10:19 PM

ebaer said…

Ed - you seemingly see tests as a way to test "quality of instruction." That would of course be silly. Tests such as MAP try to measure student learning. I realize that teachers (I am one, though not in SPS) care about quality of instruction, but as a parent I care about student learning. One thing I have learned as a parent is that if you are going to have incentives that are based on measured performance, you had better be measuring what you really care about. If I found a teacher that didn't teach well but somehow managed to get their students to learn lots, that would be terrific. (A possible example might be a teacher who teaches remotely and inspires students to learn on their own).
Yes, there are lots of reasons why it might not be in a teacher's control how much students learn, but those are the students, and if you can't teach them, get out of the way and let someone else have a shot at it. I have students that are not prepared and don't want to be in class. If I can't inspire them and I can't help them get up to speed, then I shouldn't be teaching them. My desire for a job does not trump the primary reason for my school - to educate students.
You claim there is an effective way of removing ineffective teachers. However, the numbers I have been quoted are very small (fewer than 10 in the district in a year). Do you believe that this number is true, and if so is that the actual number of ineffective teachers in SPS? I would like to hear the data that supports your assertion since I am not as informed about this part of the SEA contract.
BTW, I agree with you on the ethics - it seems like a conflict of interest that would be in violation of policy and the law. I believe all public sector employees need to avoid both actual conflicts of interest and the appearance of a conflict of interest.

8/31/10, 11:41 PM

Search This Blog

Seattle Schools Community Forum

Thoughts about using student tests to judge teachers

Comments

Popular posts from this blog

Tuesday Open Thread

Nepotism in Seattle Schools

Director Geary on advanced learning