Sample Size Does NOT Affect the Probability of a Type I Error

The delusion that it does was identified by Rosenthal and Gaito in 1963.  This delusion has persisted across several decades, despite many efforts to dispel it.  The effect of sample size is incorporated into the p value that is used to determine whether or not the researcher has data sufficiently discrepant with the null hypothesis to reject it.  If the effect is small, you will need a large sample to detect it.  If the effect is large, you will be able to detect it even with a small sample size.  It is easier to find big things than small things.  Accordingly, finding a "significant" effect with a small sample size is actually very impressive, as that is likely only if the effect is large.  Finding a significant result with a large sample could result only from the increased power associated with large sample sizes, even when the effect is so small as to be trivial.  Frankly, IMHO, the "significance" of an effect is a helluva lot less important than is estimation of the size of that effect.  This is a problem associated with the uncritical acceptance of NHST (Null Hypothesis Statistical Testing), amusing described by Cohen (1994) as SHIT (Statistical Hypothesis Inference Testing).

In my stats classes, I try to get this point across this way:

• I walk to the window and state the hypothesis that there are live ebola viruses on the ledge.  I look at the ledge carefully and then conclude I can't see any ebola viruses there, therefore I conclude that there are none there and I am willing to lick the ledge.  This is analogous to accepting a null hypothesis with research for which there was not sufficient power to detect the effect of interest.  It is a potentially deadly Type II error.
• Now I place my wallet on the ledge and ask my students to imagine that it is large, venomous spider, Phoneutria nigriventer.  I then stroll to the other end of the classroom, remove my glasses, and exclaim "Do not be putting your arm up on that ledge, there is a big spider on it."  From this distance my eyesight is very poor (20/400 in my good eye, and totally trash in the one affected by NAION), so this is analogous to obtaining a significant result even though power is low.  Such a result should indicate that the effect found is, in all likelihood, quite large.  Those who would dismiss such a result on the basis of small sample size are hereby invited to play with Phoneutria nigriventer.

I also ask my students to run this simulator, which will generate a sample of 10 scores randomly drawn from a normally distributed population in which the size of the effect is Cohen's d = 1 (large, a one standard deviation difference in means).  With a sample of 10 scores, nondirectional hypotheses, alpha .05, and a medium-sized effect, power would be 29% -- but the effect size is not medium, it is large.  In the long run, 80% of the samples would result in a significant result, reflecting a correct decision, not a Type I error.  Should we reject these correct decisions simply because they are based on a small sample size, and, accordingly, indicate that the effect detected is likely large?

Would the research have been better with a larger sample size?  Absolutely, because that would give one more precision with respect to estimation of the size of the effect.  With the parameters specified here, the standardized confidence interval for the effect size is most often greater than one standard deviation in width -- for example, running from 0.5 to 1.5.  While that still indicates that the effect is likely large, it would be better to have a tighter confidence interval, which would result with larger sample sizes.

There is another potential drawback to small sample sizes that could result in the overestimation of effect sizes.  Since journals are reluctant to publish work that is not significant, those studies which are significant are more likely to have overestimated the effect size than are those which are not published.  If the meta-analyst is not able to obtain the results of those unpublished studies, then the meta-analysis is likely to overestimate the true effect size.

References

• Rosenthal, R. & Gaito, J.  (1963).  The interpretation of levels of significance by psychological researchers, The Journal of Psychology, 55, 33-38, doi: 10.1080/00223980.1963.9916596

• Cohen, J. (1994).  The earth is round (p < .05).  American Psychologist, 49: 997-1003.

November, 2014.