The fatal flaw of educational assessment

Long have I recognized the problem with New York's testing regimen. It seeks to use the tests in an invalid manner. Test validity is a term from our educational and psychological assessment classes that means the test measures what it purports to do. If I want to know your blood pressure, a cuff will appear; I do not pull out a thermometer- it would not give me the information I need. The same goes with educational tests. If I want to measure how much progress a student has made, I use one sort of test and if I want to compare him to students across the region, state, country or world, I use a different one.

In New York our tests are to measure where students are in comparison to other students. We know this because they use complex analysis to tell us how the individual is compared to others like him across the state. We also compare him to all others within a school. Cut scores are determined after the test is given and based on the idea that a certain percentage of students should be at various levels. The idea that all will be above a score is crazy because they are going to report results in a bell curve type of distribution.

New York state tests are also used to measure if a student is in need of support. Students who perform poorly or are at risk of performing poorly are supposed to receive additional educational support. This could be in the form of additional small group instruction, prescriptive educational programs, before/after school tutoring, computer based instruction or any number of other models.

New York state tests are also used to evaluate teachers, schools, and districts.  While there is currently a moratorium on using student test scores in teacher evaluations, the information is still being collected and could be reported to the public. Schools that do not improve their results or those that consistently demonstrate low performance may be taken over by the state or given over to receivership.

Unfortunately the only thing that our New York state tests measure very well is socioeconomic level. If you are a poor school, your results are likely to be poor. If you are a student in poverty, you are likely to not do well. Furthermore, since test scores do not come back to the school for four to five months, and do not include detailed analysis of performance, they cannot be used to facilitate instructional decision making.

W. James Popham, one of my favorite educational psychologists, wrote a commentary for Ed Week entitled The Fatal Flaw of Educational Assessment. He talks about the purposes behind assessment and how important it is to match the intended purpose of the test to the uses of the data. We need to decide what we want our tests to measure and use tests that do that. While we need to use data to inform decisions, we need data that is valid for the decisions we are looking at. Numbers, in and of themselves, are not useful for anything. 30-24-30 could as easily be a geographic location as a woman's measurements as the last three scoring records from a hockey team. We need to know what they refer to before we make any decisions.

