This should be obvious about research in education: If teachers or administrators don’t care about the outcomes we measure, then no matter how elegantly we design and analyze experiments and present their findings, they won’t mean much.
A straightforward—simplistic, perhaps—approach to making an experiment meaningful is to measure whether the program we are testing has an impact on the same test scores to which the educators are held accountable. If the instructional or professional development program won’t help the school move more students into the proficient category, then it is not worth the investment of time and money.
Well, maybe. Suppose the high-stakes test is a poor assessment of the state standards for skills like problem-solving or communication? As researchers, we’ve found ourselves in this quandary.
At the other end of the continuum, many experimental studies use outcome measures explicitly designed to measure the program being studied. One strategy is to test both the program and the comparison group on material that was taught only to the program group. Although this may seem like an unfair bias, it can be a reasonable approach for what we would call an “efficacy” study—an experiment that is trying to determine whether the program has any effect at all under the best of circumstances (similar to using a placebo in medicine). Still, it is certainly important for the consumers of research not to mistake the impact measured in such studies with the impact they can expect to see on their high-stakes test.
Currently, growth models are being discussed as better ways to measure achievement. It is important to keep in mind that these techniques do not solve the problem of mismatch between standards and tests. If the test doesn’t measure what is important, then the growth model just becomes a way to measure progress on a scale that educators don’t believe in. Insofar as growth models extend high-stakes testing into measuring the amount of student growth for which each individual teacher is responsible, the disconnect just grows.
One technique that experimental studies can take advantage of without waiting for improvements in testing is the measurement of outcomes that consist of changes in classroom processes. We call these “mediators” because the process changes result from the experimental manipulation, they happen over time before the final outcome is measured, and in theory they represent a possible mechanism by which the program has an impact on the final outcome. For example, in testing an inquiry-based math program, we can measure—through surveys or observations—the extent to which classroom processes such as inquiry and hands-on activities appear more (or less) among the program or comparison teachers. This is best done where teachers (or schools) have been assigned randomly to program or comparison groups. And it is essential that we are measuring some factor that could be observed in both conditions. Differences in the presence of a mediator can often be measured long before the results of outcome tests are available, giving school administrators an initial indication of the new program’s success. The relationship of the program’s impact on the mediator and its impact on the test outcome can also tell us something about how the test impact came about.
Testing is far from perfect. Improvements in the content of what is tested, combined with technical improvements that can lower the cost of delivery and speed the turn-around of results to the students and teachers, will benefit both school accountability and research on the effectiveness of instructional and professional development programs. In the meantime, consumers of research have to consider whether an outcome measure is something they care about. —DN