Sample Size Does NOT Affect the Probability of a Type I Error

    The delusion that it does was identified by Rosenthal and Gaito in 1963.  This delusion has persisted across several decades, despite many efforts to dispel it.  The effect of sample size is incorporated into the p value that is used to determine whether or not the researcher has data sufficiently discrepant with the null hypothesis to reject it.  If the effect is small, you will need a large sample to detect it.  If the effect is large, you will be able to detect it even with a small sample size.  It is easier to find big things than small things.  Accordingly, finding a "significant" effect with a small sample size is actually very impressive, as that is likely only if the effect is large.  Finding a significant result with a large sample could result only from the increased power associated with large sample sizes, even when the effect is so small as to be trivial.  Frankly, IMHO, the "significance" of an effect is a helluva lot less important than is estimation of the size of that effect.  This is a problem associated with the uncritical acceptance of NHST (Null Hypothesis Statistical Testing), amusing described by Cohen (1994) as SHIT (Statistical Hypothesis Inference Testing).

    In my stats classes, I try to get this point across this way:

    I also ask my students to run this simulator, which will generate a sample of 10 scores randomly drawn from a normally distributed population in which the size of the effect is Cohen's d = 1 (large, a one standard deviation difference in means).  With a sample of 10 scores, nondirectional hypotheses, alpha .05, and a medium-sized effect, power would be 29% -- but the effect size is not medium, it is large.  In the long run, 80% of the samples would result in a significant result, reflecting a correct decision, not a Type I error.  Should we reject these correct decisions simply because they are based on a small sample size, and, accordingly, indicate that the effect detected is likely large?

    Would the research have been better with a larger sample size?  Absolutely, because that would give one more precision with respect to estimation of the size of the effect.  With the parameters specified here, the standardized confidence interval for the effect size is most often greater than one standard deviation in width -- for example, running from 0.5 to 1.5.  While that still indicates that the effect is likely large, it would be better to have a tighter confidence interval, which would result with larger sample sizes.

    There is another potential drawback to small sample sizes that could result in the overestimation of effect sizes.  Since journals are reluctant to publish work that is not significant, those studies which are significant are more likely to have overestimated the effect size than are those which are not published.  If the meta-analyst is not able to obtain the results of those unpublished studies, then the meta-analysis is likely to overestimate the true effect size.  


Return to Wuensch's Stat Help Page

November, 2014.