Saturday, February 17, 2018

Better than nothing?

I've been reading various online articles and blog posts about the replication crisis in science (particularly psychology), replete with examples of potentially spurious research results that have become widespread beliefs . Some have even argued that this is becoming increasingly unavoidable given the nature of incentives that tend to operate in academia.

Although there are obvious and non-trivial differences between academic research and test development, this made me reflect on a debate that occurs in language testing (or educational testing in general) -- when is having no test better than having a bad test? In other words, at what point do the ramifications of having a bad (poor reliability, insufficient construct validity, etc.) test outweigh whatever benefits it might have?

Because tests are usually employed to make some sort of decision -- placement, selection, achievement, diagnosis -- the benefit of a test is its utility in providing information to help make that decision (which could be as mechanical as "anything above 80% correct is sufficient for credit in this course"). It is easy to proclaim that one should never use a poor test, but that assertion is of little use to those responsible for whatever decision needs to be made. Your current placement test might not be very good, but that knowledge alone doesn't help you place 1,200 students into appropriate language classes at the start of the semester.

But what are the effects of a bad placement test? If the test results are treated uncritically, weak students who are mistakenly placed into difficult courses will be seen as "lazy" or their instructors seen as "incompetent." At what point to those knock-on effects become problematic?  Placement testing is a particularly interesting area, because placement is a function of both the student and the curriculum. The extent to which the test "works" is dependent on the quality of the curriculum (i.e., if there is a mismatch between the available courses and the student population, you will never find the "perfect" placement test).

As I write this, I realize that the real analogy is not between testing and research, but rather between testing and peer review / tenure decisions / "impact" metrics and the like. It is easy to point out flaws in the system, but it is much harder to determine an appropriate solution. The question is, at what point is nothing better than the something we currently have?

No comments: