Thursday, June 28, 2012

Why do we have all these issues with statistics nowadays?

It's not just voodoo correlations in social neuroscience.

There's all the problems with genome wide association studies, and there's lack of replications in medicine and other fields, and of course there's the rest of neuroimaging, and other ones. I predict that any day they will start to realize it's time for statistical re-education in old fashioned fields like single neuron physiology.

More and more there are people getting uptight about statistics. I see more and more calls for changes in the way statistics are done. I have heard that Nature Neuroscience for one has started doing stats reviews independently of the rest of the review, possibly with other reviewers.

I think most people can agree that statistics is pretty rotten and that it's time for improvements.

But I wonder, why now? Has this always been a problem but no one noticed until now?

My hunch is that this really is a new problem. There's always been an underlying low-level problem, sure. But the fact that the problem is enormous nowadays, I think is new.

And I think the problem reflects a major change in the way data are collected and stored and analyzed. Without exactly realizing it, we are living in the era of Big Data. But we are using the statistical methods invented to handle Small Data.

When you have huge amounts of data, you can very quickly search for patterns, and you can even make your computer search for patterns for you. You can make it search all night and it will do as much in one night as it used to take computers centuries to do. With this computing power comes the possibility of false positives, and with that comes the need for multiple comparison tests.

And that's never really been something that statisticians have worried too much about before, because it was never an issue.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.