East Carolina University
Department of Psychology
What Effect Size Should I Use in Power Analysis &
How Much Power Should I Want?
Correspondent Tony Napoli asks some great questions here. My responses are in purple
Dear Dr. Wuensch,
I’m Tony Napoli, a social scientist and perennial student of statistics, who has wandered into some murky waters regarding estimating effects sizes.
First off, let me express my appreciation for your very informative statistics WebPages. They are indeed an oasis in the vast dessert. I and my students have benefitted greatly from them.
My dilemma concerns estimating effect sizes for power analysis (sample size estimation) in the absence of any information on what effect size to expect.
As a graduate student in the 1990s, the prevailing view within my department was that without preliminary information (aka pilot data) or published results to “hang your hat on” the researcher should “shoot” for the minimum sample size that would produce a statistically significant result under the worst case scenario – that would be a small effect size.
I tell my students that the ideal situation is to have lots of power (95%) to detect the smallest effect that you would consider not to be trivial. What that smallest nontrivial effect size is depends on situational factors and is rather subjective. Of course, one could always fall back on Cohen's benchmarks, such as .2 for a standardized difference between means or .1 for a Pearson r.
If one has 95% power to detect the
smallest effect considered to be nontrivial, then one can make a strong
statement regardless of whether the null is rejected. If the null is
rejected, great, and the large power will enable you to estimate the size of the
effect with good precision. If the null is not rejected, then you can
argue that the effect is so small that it might as well be zero. That
argument would be strengthened by providing a 95% confidence interval for the
effect. For example, -0.05 < rho < +.02 is pretty convincing evidence that
rho is nearly zero. That said, confidence intervals like 0.03 < rho < 0.06
are also convincing that the effect is nearly zero, even though significant.
In my review of the literature, aspiring investigators are sometimes advised to select an effect size based on a desired (e.g., clinical) effect (for example http://www.power-analysis.com/power_analysis.htm). This seems to be somewhat self serving: Who wouldn’t want a large effect, obtained from a small sample.
Not I, unless I were a
pharmaceutical firm trying to establish the bioequivalence of my product.
Folks at the University of Michigan's MEERA website opine that because effect size can only be calculated after you collect data from program participants, you will have to use an estimate for the power analysis. Common practice is to use a value of 0.5 as it indicates a moderate to large difference. This suggestion, to use estimate a moderate/large effects size (ES; presumably Cohen’s d in this case), seems unwarranted and likely lead to a Type-II error. Also, I cannot find an authoritative source for this recommendation/convention.
So unlike other conventions, for example, set alpha = .05 (The Earth is Round) and power should be set to .80 (Trochim, 2006; there doesn’t appear to be an acceptable convention for estimating ES, in the absence of any other information.
If you use G*Power, you will see that the default value for amount of desired power is 95%. So, why do the Germans use 95% and those in the US use 80% ? My guess is that because we are so shocked when we how many cases we need to get that much power. Perhaps the Germans are better able to get enough data because their government still supports research. IMHO, setting both alpha and beta should follow a consideration of the relative seriousness of Type I versus Type II errors. If these two sorts of errors are considered equally serious, and .05 a reasonable value for alpha, then the reasonable value for beta should also be .05 -- that is, we should have 95% power. Does the convention of alpha = .05 and beta = .20 indicate that the researchers consider Type I errors more serious than Type II errors? I doubt it. I doubt that the typical researcher ever even ponders the relative seriousness of Type I versus Type II errors.
Back to the Stat Help Page
Contact Information for the Webmaster,
Dr. Karl L. Wuensch
This page most recently revised on the 7th of February, 2015.