P-Hacking

A fun new term: P-Hacking. It refers to manipulating the "P-value", a statistical measure of significance in analyzing the results of an experiment. Traditionally, P is interpreted as the probability that what was observed could have happened by a random fluctuation. Publishable results are supposed to have P < 0.05, that is, less than a 5% chance that the outcome was sheer luck.

Toss a coin five times and get all Heads; that should only happen one time in 2⁵ = 32 trials. Is getting five Heads in a row strong evidence that the coin is double-headed? That depends. Was the hypothesis, "This coin has two heads" selected for testing before tossing began? Was the number of tosses chosen in advance? Were any sets of tosses thrown out from the data set before (or after) the one that was used?

Andrew Gelman and Eric Loken have written a lovely paper, "The garden of forking paths: Why multiple comparisons can be a problem, even when there is no 'fishing expedition' or 'p-hacking'". A colleague at work recommended it, and it's definitely worth reading. Gelman in his blog Statistical Modeling, Causal Inference, and Social Science commented further on it earlier this year. The key point is stated at the end of the original manuscript:

... We want our scientists to be creative, but we have to watch out for a system that allows any hunch to be ratcheted up to a level of statistical signicance that is then taken as scientic proof. And we need to be aware of these concerns in our own research, not just in that of others. ...

(cf. Medicine and Statistics (2010-11-13), Statistical Hypothesis Inference Testing (2013-12-01), ...) - ^z - 2014-09-20