verification | Štěpán Bahník

Consider a somewhat ridiculous example. You want to study whether political attitudes are stable, or whether they are determined to a large extent by random influences. To study this you design a simple experiment. You have half of the participants drink sauerkraut juice and the other half orange juice. Then, you measure their political attitudes. If people have stable political attitudes, it should not matter what juice you give them. The political attitudes should stay the same. But if they are determined to a large extent by random influences, it may matter, and you might find an effect of juice flavor.

To express this formally, we will label the stable attitudes theory as T_S and the random influences theory as T_R. Since the juice flavor should probably have no effect (labeled as E₀) if T_S holds, the probability of E₀ conditional on T_S is very high, say P(E₀|T_S) = 0.99. There is still a slight possibility that there will be an experimenter’s effect or that the effect might operate through some unknown way that is, nevertheless, compatible with the theory. Be that as it may, the probability that there is an effect (E₁) if T_S holds is very low, P(E₁|T_S) = 0.01. While the random influences theory seems to be more in line with the juice flavor effect, it does not really rest on it. It can be always possible to say that the random influences are something else, that the juice would have an effect under different circumstances, in different participants, etc. Consequently, P(E₁|T_R) is higher than P(E₁|T_S), but it is still low. Say, P(E₁|T_R) = 0.10, and thus P(E₀|T_R) = 0.90. It is important to note that we are talking here about predictions of theories and not about results of an experiment. For simplification, we also just use binary true-or-false-effect, but the reasoning would hold even if we were talking about effect sizes.

Now, what happens if we find an effect of juice flavor? We should update our beliefs by the likelihood ratio, which is P(E₁|T_R) / P(E₁|T_S) = 0.10 / 0.01 = 10. That is, the experiment shows strong evidence for the random influences theory and it makes sense to publish it – the study is informative. What if we find no effect? We should again update our beliefs, but the likelihood ratio is in this case P(E₀|T_S) / P(E₀|T_R) = 0.99 / 0.90 = 1.1, which is hardly informative and you will have a huge trouble publishing this study.

The argument does not depend on statistical power. You may have infinite sample size, and the conclusions will be the same. The problem is in the design of the study. The study was tailored to verify the random influences theory and it cannot falsify it — by design. There is a lot of studies like this in psychology these days. People are trying to show sexy effects and not to test well-defined theories. Even without ill intent, this leads to publication bias and all the hurly-burly we are currently in.

What can be done with this? Primarily, we should design studies that are able to test theories. We should design studies that are publishable no matter what the result is. An ideal study would therefore test opposing predictions of different theories. “But wait, Štěpán, theories in psychology don’t give clear predictions!”, you might disagree if you felt brave enough to try to pronounce my name. Unfortunately, you would be right. The problem lies a bit deeper. Theories in psychology are usually very vaguely defined. It would therefore also help if psychological theories actually tried to make some strong predictions.

Note: The idea presented here is related to the difference between conceptual and direct replications. A conceptual replication is often intended to verify a hypothesis. If it finds a null effect, our perception of the original study does not change that much as if the replication was direct. Direct replications are usually better suited to falsify hypotheses. Conceptual replications are important and they may be more valuable than direct replications under certain conditions. However, they are more likely to be associated with publication bias. A null effect found in a conceptual replication is often not really informative, and it is therefore more likely to stay in the file drawer.

Štěpán Bahník

Category Archives: verification

The problem of verification