PSYC 6430 Readings Associated with Chapter 8 (Power) in Howell


   You should be prepared to answer, on the final exam in PSYC 6430, questions relating to the readings listed below.  Below the listings I have a several comments which you should review.  If you have any questions about these readings, please ask me, electronically or in class.

Howell Chapter 8, Power

1. American Psychological Association. (2001). Task Force on Statistical Inference Initial Report

2. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.

3. Huberty, C. J. (1987, November). On statistical testing. Educational Researcher, pp. 4-9.

4. Nelson, N., Rosenthal, R., & Rosnow, R. L. (1986). Interpretation of significance by psychological researchers. American Psychologist, 41, 1299-130l.

5. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241-301.

6. Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.

7. Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115-129.

8. Tachibana, T. (1980). Persistent erroneous interpretation of negative data and assessment of statistical power. Perceptual & Motor Skills, 51, 37-38.

9. Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. Click here for a summary of this article or Click here to read the article online.

10. Wuensch, K. L. (1994). Evaluating the relative seriousness of type I versus type II errors in classical hypothesis testing. In B. Brown (Ed.), Disseminations of the International Statistical Applications Institute: Vol 1 (3rd ed., pp. 76-79). Wichita, KS: ACG Press. Available at http://core.ecu.edu/psyc/wuenschk/StatHelp/Type-I-II-Errors.htm.

11. Wuensch, K. L. (1987). Frequency of type I errors in professional journals. Unpublished manuscript available at http://core.ecu.edu/psyc/wuenschk/StatHelp/Type1.htm.

 

    The hybrid approach advocated by Huberty (1987) is essentially what I have been teaching you:

    Cohen's (1992) power primer provides for common statistical tests the required number of research units necessary to have 80% power for small, medium, or large effects.  Essentially the same information can be found in my document Estimating the Sample Size Necessary to Have Enough Power.

    Tachibana (1980) pointed out that most research in Psychology has so little power that when attempting to replicate an effect previously reported as statistically significant the expected result is a Type II error.  Psychologists are know to write all sorts of drivel trying to explain the failure to replicate -- maybe the effect is different for the population from which my research participants were drawn, maybe current events have modified the effect, maybe the effect interacts with the phase of the moon, and so on.  Apparently the gatekeepers of the published literature in psychology did not want to hear this -- Perceptual and Motor Skills is not exactly a top-notch journal.

    Nelson, Rosenthal, and Rosnow (1986) replicated and extended the earlier research of Rosenthal (1963).  Those earlier results showed that psychologists are more impressed with significant results from research with large sample sizes than with significant results from research with small sample sizes, even when the obtained p is held constant.  In other words, psychologists really do not understand hypothesis testing very well.  Consider this example:  Researcher A reports a significant effect (p = .047) based on data from 15 subjects.  Research B reports a significant effect (p = .047) based on data from 1,500 subjects.  Which result is more impressive?  Most psychologists are more impressed with B, but they should be more impressed with A.  Since A obtained a significant result despite having lower power (due to the small sample size), A has probably found a large effect.  B had so much power that e would be expected to find significant results even if the effect was so small as to be trivial -- in fact, with 1,500 subjects and a p of .047, the effect size would be so small as to be trivial.  The research reported by A should be conducted again, on a larger sample size, to be better able to estimate the size of the effect.  With a small sample size, a confidence interval for the effect will be rather wide.

    Consider also this next example:  Researcher C reports a nonsignificant effect (p = .47) based on data from 15 subjects.  Researcher D reports a nonsignificant effect (p = .47) based on data from 1,500 subjects.  Which result is more impressive?  Now the large sample research is more impressive.  D has so much power that e would be almost certain to obtain significant results if the effect were nontrivial in magnitude, so e's nonsignificant results allow em to assert that the null hypothesis is true or very close to true.  D's confidence interval will be narrow and will include zero.  C has so little power that the most likely reason for nonsignificance is a Type II error.  C's confidence interval will be wide, ranging from a large effect in one direction to a large effect in the opposite direction.

    Note that Nelson et al. suggest that researchers always report an exact p and an estimate of effect size.  Also note that the Rosenthal (1963) article appeared in a journal that gets little respect (the Journal of Psychology).  At the time he was a junior faculty member at the University of North Dakota, and the guardians of the publication gate did not want to hear what he had to say.  He is now at Harvard, and publishes in the most prestigious of journals, such as the American Psychologist.

   Wuensch (1987) attempted to publish a short manuscript that would explain, in language simple enough that the typical psychologist could understand it, that the frequency of Type I errors in the published literature is not equal to the criterion of statistical significance (5%).  American Psychologists did not want to hear that.  The American Psychologist would rather publish a short report that served to perpetuate that delusion.

    Rosnow and Rosenthal (1989) remind us that p is a continuous variable, and that it makes little sense to discriminate between research for which p = .049 and research for which p = .051.  They also point out that the famous physicist Enrico Fermi thought that p = .10 was a wise operational definition of a "miracle."  I guess psychologists employ a higher standard than do physicists.  Rosenthal and Rosnow also recommend that researchers always report effect size estimates, for both effects which are statistically significant and effects that are not statistically significant.

    Frank Schmidt (1996), in his presidential address (the Division of Evaluation, Measurement, and Statistics, American Psychological Association) argued that NHST (null hypothesis statistical testing) has impeded scientific progress in psychology and should be abandoned.  Among the points made by Schmidt are the following:

    Nickerson (2000) also lists false beliefs about hypothesis testing:

    Nickerson also made the following important points:

     After you read the reports from the APA Task Force on Statistical Inference, read my summary of the main points made in those reports.

    And now for a little chuckle, thanks to the late J. Cohen:

    From Cohen, J. (1994).  The earth is round (p < .05).  American Psychologist, 49: 997-1003.
 
     "And we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis significance testing (NHST; I resisted the temptation to call it
Statistical Hypothesis Inference Testing)......
 


Back to the Readings for Students in Graduate Statistics Page

Visit Karl's Index Page


Contact Information for the Webmaster,
Dr. Karl L. Wuensch
 


This page most recently revised on 13. November 2004.