winterkoninkje | (Reply)

I'm a bit dubious of the stats here:

"I purposely waited until I thought there was a critical mass that wasn't a statistical fluke," he says.

That bit there is a red flag to me. The decision about when to stop an experiment should NOT be based on the significance figures for results to date, otherwise you're at great risk of cherry-picking.

If I run a random number generator and keep rerunning the p-value, I will see that drift all over the place. Every so often, it'll drift into 'significant' territory; as the trial goes on that drift will slow, but wait long enough and you can meet any significance threshold you like.

If you have a few hundred colleagues running similar experiments, all holding off on publishing until they get a strong finding, you can get some *really* impressive-looking results.