======================================================================== 36 Date: Sun, 25 Aug 96 18:27:28 EDT Signif-Testing.txt From: "Karl L. Wuensch" Subject: Statistical Hypothesis Inference Testing To: ecupsy-l@ECUVM1, estat-l@ECUVM1, "David C. Howell" The second issue of Psychological Methods (the new journal which has taken the place of the quantitative section in Psychological Bulletin) has a most interesting article by Frank L. Schmidt. Entitled "Statistical significance testing and cumulative knowledge in Psychology: Implications for training of researchers," the article was earlier presented as the presidential address to Division 5 (Evaluation, Measurement, and Statistics) of the American Psychological Association at the 1994 convention in Los Angeles. I think this article potentially important to all researchers in the behavioral sciences and especially important to those who train researchers. Accordingly, I have given copies of the article to those in Psychology who are involved in the teaching of graduate statistics. Others who would like to see the article are invited to drop by my office, which may be closer than the library. Here are some quotes and comments to give the flavor of the article: "My conclusion is that we must abandon the statistical significance test. In our graduate programs we must teach that for analysis of data from individual studies, the appropriate statistics are point estimates of effect sizes and confidence intervals around these point estimates." Schmidt gives an example of how traditional significance testing leads us to conclude that the research literature is full of conflicting results, while the proposed "new" method (effect size estimates with confidence intervals for individual studies followed by meta analysis once several replication attempts have been conducted) does not. He shows how the traditional method leads to distorted estimates of effect size (given typical power, only studies with effect size estimates greater than the true effect size in the population will be statistically significant) and to a much greater than commonly recognized error rate (null hypotheses are never true, so the actual error rate is beta, one minus power -- given typical power one would have a smaller error rate by just flipping a coin to decide whether or reject the null or not). Schmidt also reveals some of the "false beliefs" of those who have faith in traditional significance testing. For example, many are deluded in thinking that a significant result indicates that were you to repeat the experiment you would likely get the same significant results (not so given typical levels of power). These false beliefs are discussed in more detail in the chapter "Eight false objections to the discontinuation of significance testing in the analysis of research data," which is to appear in Harlow & Mulaik's book "What if there were no significance testing," to be published by Lawrence Erlbaum Associates. Schmidt goes on to note that despite many prominent statisticians having made much the same argument as does he, things seem to just be getting worse. For example, typical power levels have dropped in recent years, as we become more and more paranoid about rejecting true null hypotheses (which don't exist) and employ alpha-adjusting procedures such as Student-Newman- Keuls, Tukey, Bonferroni, Sidak, etc. Schmidt strikes an optimistic note, however, opining that this time the reform movement has a better chance to succeed. Why? Well, for one thing, meta-analysis techniques have evolved to the point (and have gotten enough attention) that they can be used to illustrate the folly of the traditional method. And second, the APA has gotten involved, possibly providing a "top-down" pressure for reform. See the memo I've attached at the bottom. Speaking to us teachers, Schmidt says: "quantitative psychologists and teachers of statistics and other methodological courses have the responsibility to teach researchers not only the high costs of significance testing but also the fact that the benefits typically ascribed to them are illusory." ....... "Yet the content of our basic graduate statistics courses has not changed." "we are training our young researchers in the discredited practices and methods of the past." So, what can the instructor of graduate statistics expect if he or she joins the reform movement and teaches the "new" method. Well, as you might expect, that instructor can look forward to "protests from significance testing traditionalists among the faculty." In the chapter mentioned above, Schmidt notes that even when those faculty admit that the old method is flawed and the new method superior, they argue that we must stick to the old methods if we expect to publish, as those are the methods the "expert" reviewers of our manuscripts will insist we use. I have tried to show my students the weaknesses of sigificance testing (I advise them to report effect size estimates and confidence intervals even when they have submitted to reporting the obligatory test of statistical significance), but I continue to teach the traditional method, as I know damn well that is what my colleagues expect and what is expected in research manuscripts submitted to journals. In closing this note, let me add that Schmidt's article does not really contain much if anything new. So why has it excited me? Well, it is the way Schmidt presents his arguments. There have been many articles making similar points, but these articles have been difficult reading for the typical behavioral scientist (just ask my long-suffering graduate students, and I don't even assign the more difficult articles to them). Schmidt's article strikes me as being written in such a way that the typical behavioral scientist will be able to understand the argument, even if she or he does not like the message. ************************************************************ Board of Scientific Affairs Action on Significance Testing By Frank Schmidt In an article last year in the American Psychologist, Jacob Cohen (1994) urged that psychologists completely discontinue the use of statistical significance testing in analyzing research data and instead employ point estimates of population parameters and confidence intervals. In his Division 5 Presidential Address at the 1994 APA convention, Schmidt (in press) reached the same conclusion. Albert Bartz, a psychologist who has authored several statistical texts, brought this issue to the attention of the APA Board of Scientific Affairs in March of 1995. He proposed that the Board appoint a Task Force to make recommendations as to how to implement the phasing out of statistical significance testing in course texts, journal articles, etc. The Board was provided with copies of the Cohen article, the Schmidt paper, and other materials. At its November 3-5 meeting the Board took up Dr. Bartz' proposal. Board member Duncan Luce took the lead in laying out this issue for the Board. The Board approved in principle the notion of a Task Force to study this question and make recommendations. The Board also felt that the question was larger than APA; they felt that APS, Division 5, the Society for Mathematical Psychology, and other organizations should be given the opportunity to be involved. They also felt they should at least check out the potential involvement of other disciplines, such as statistics (through the American Statistical Association). The Board plans to bring this question up at a meeting of the Federation of Behavioral and Social Sciences (which includes Anthropology, Sociology, Economics, and other social sciences). The Board appointed a committee of its members to study this question and make recommendations to be acted on at its March 1996 meeting, specifically: 1. What the plan for the Task Force should be. 2. What the budget for the Task Force should be. 3. Who should be on the Task Force. The subcommittee will talk to a variety of people outside and inside APA before making its recommendations. The chair of the subcommittee is Duncan Luce. Suzanne Wandersman, APA staffer to the Board, reported that the Board appeared to be very favorable to the idea of doing away with statistical significance testing. She thinks this effort will go forward and that there will be quite a bit of activity on it in 1996. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003. Schmidt, F.L. (in press). Statistical significance testing and cumulative knowledge psychology: Implications for the training of researchers. Psychological Methods.