In the previous
episode we looked at 9 different statistical pitfalls.
-Sampling bias
-Different types of averages
-Small sample size
-Confidence intervals
-Graphs
-Linking unrelated statistics
-Correlations -vs- causation
-Computer models
- Search engine results and social media
That is an awful lot of things to keep track of. Today I am going to simplify it a bit into 5
questions you can ask about any statistic.
Like the previous podcast, most of the ideas in this podcast come from
Darrell Huff’s How to lie with statistics.
Question 1: Who says so?
Everyone who does research has a motivation. That motivation will tell you a lot about
what their conscious and unconscious biases may be. For instance, in a study by a drug company an
obvious conscious bias is the desire to sell more product. That bias should be considered when
interpreting the results. An unconscious
bias is that the majority of studies are done using educated, fairly prosperous
individuals from first world countries.
Will the results be applicable to other populations? Hard to say until we study it in those
populations.
Is the study backed by someone well known or famous? As a general rule, if someone needs a famous
spokesperson, then I am automatically suspicious. However, the nature of the spokesperson can
change that a bit. If the spokesperson
is a Nobel laureate in that field, then their endorsement may have a little
more weight. If they are an athlete,
movie star, etc… I have to ask myself, what would make them an expert in this
field they are endorsing? Chances are
probably not much.
Question 2: How do they know?
What was their sample size?
Sample size makes a huge difference.
Generally, with a couple of caveats we will talk about in a second, the
larger the sample size the more we can trust it. Have you ever had anyone who didn’t have kids
try and tell you how to raise yours?
Their sample size is effectively zero.
When I only had one child, I thought I was getting it figured out pretty
well. When child number two came along,
I realized that I still didn’t know anything. In a narrow field, say treatment
for a very rare disease, a sample size of 10 or 20 might be enough. In making general conclusions about a large
population, a sample size of 100 or even 1000 might not be enough. Most political polls have a sample size of
~1000. Is asking 1000 people a question
good enough to determine how the 300+ million people of the US would answer
that question? Have you ever wondered
why so many medications are recalled for serious side-effects shortly after
they go through the long and difficult FDA approval process? It has to do with sample size. If the serious side-effect only happens to
1/10,000 people, a study involving 5000 people isn’t large enough to detect the
side effect. However, when it is
released to the general public, the sample size grows dramatically, and
suddenly we start seeing many side effects never detected during the trials. Sample size matters.
However, a large sample size can be totally invalid if it
was improperly selected. Someone who is