======================================================================== 36
Date: Sun, 25 Aug 96 18:27:28 EDT
Signif-Testing.txt
From: "Karl L. Wuensch"
Subject: Statistical Hypothesis Inference Testing
To: ecupsy-l@ECUVM1,
estat-l@ECUVM1,
"David C. Howell"
The second issue of Psychological Methods (the new journal which has
taken the place of the quantitative section in Psychological Bulletin) has
a most interesting article by Frank L. Schmidt. Entitled "Statistical
significance testing and cumulative knowledge in Psychology: Implications
for training of researchers," the article was earlier presented as the
presidential address to Division 5 (Evaluation, Measurement, and Statistics)
of the American Psychological Association at the 1994 convention in Los
Angeles. I think this article potentially important to all researchers in
the behavioral sciences and especially important to those who train
researchers. Accordingly, I have given copies of the article to those in
Psychology who are involved in the teaching of graduate statistics. Others
who would like to see the article are invited to drop by my office, which
may be closer than the library. Here are some quotes and comments to give the
flavor of the article:
"My conclusion is that we must abandon the statistical significance test.
In our graduate programs we must teach that for analysis of data from
individual studies, the appropriate statistics are point estimates of effect
sizes and confidence intervals around these point estimates."
Schmidt gives an example of how traditional significance testing leads
us to conclude that the research literature is full of conflicting results,
while the proposed "new" method (effect size estimates with confidence
intervals for individual studies followed by meta analysis once several
replication attempts have been conducted) does not. He shows how the
traditional method leads to distorted estimates of effect size (given typical
power, only studies with effect size estimates greater than the true effect
size in the population will be statistically significant) and to a much greater
than commonly recognized error rate (null hypotheses are never true, so the
actual error rate is beta, one minus power -- given typical power one would
have a smaller error rate by just flipping a coin to decide whether or reject
the null or not).
Schmidt also reveals some of the "false beliefs" of those who have faith
in traditional significance testing. For example, many are deluded in
thinking that a significant result indicates that were you to repeat the
experiment you would likely get the same significant results (not so given
typical levels of power). These false beliefs are discussed in more detail
in the chapter "Eight false objections to the discontinuation of
significance testing in the analysis of research data," which is to appear
in Harlow & Mulaik's book "What if there were no significance testing," to
be published by Lawrence Erlbaum Associates.
Schmidt goes on to note that despite many prominent statisticians having
made much the same argument as does he, things seem to just be getting worse.
For example, typical power levels have dropped in recent years, as we
become more and more paranoid about rejecting true null hypotheses (which
don't exist) and employ alpha-adjusting procedures such as Student-Newman-
Keuls, Tukey, Bonferroni, Sidak, etc.
Schmidt strikes an optimistic note, however, opining that this time the
reform movement has a better chance to succeed. Why? Well, for one thing,
meta-analysis techniques have evolved to the point (and have gotten enough
attention) that they can be used to illustrate the folly of the traditional
method. And second, the APA has gotten involved, possibly providing a
"top-down" pressure for reform. See the memo I've attached at the bottom.
Speaking to us teachers, Schmidt says:
"quantitative psychologists and teachers of statistics and other
methodological courses have the responsibility to teach researchers not only
the high costs of significance testing but also the fact that the benefits
typically ascribed to them are illusory." ....... "Yet the content of our
basic graduate statistics courses has not changed." "we are training our
young researchers in the discredited practices and methods of the past."
So, what can the instructor of graduate statistics expect if he or she
joins the reform movement and teaches the "new" method. Well, as you might
expect, that instructor can look forward to "protests from significance
testing traditionalists among the faculty." In the chapter mentioned above,
Schmidt notes that even when those faculty admit that the old method is
flawed and the new method superior, they argue that we must stick to the
old methods if we expect to publish, as those are the methods the "expert"
reviewers of our manuscripts will insist we use. I have tried to show my
students the weaknesses of sigificance testing (I advise them to report effect
size estimates and confidence intervals even when they have submitted to
reporting the obligatory test of statistical significance), but I continue to
teach the traditional method, as I know damn well that is what my colleagues
expect and what is expected in research manuscripts submitted to journals.
In closing this note, let me add that Schmidt's article does not really
contain much if anything new. So why has it excited me? Well, it is the
way Schmidt presents his arguments. There have been many articles making
similar points, but these articles have been difficult reading for the
typical behavioral scientist (just ask my long-suffering graduate students,
and I don't even assign the more difficult articles to them). Schmidt's
article strikes me as being written in such a way that the typical behavioral
scientist will be able to understand the argument, even if she or he does not
like the message.
************************************************************
Board of Scientific Affairs Action on Significance Testing
By Frank Schmidt
In an article last year in the American Psychologist,
Jacob Cohen (1994) urged that psychologists completely
discontinue the use of statistical significance testing in
analyzing research data and instead employ point estimates
of population parameters and confidence intervals. In his
Division 5 Presidential Address at the 1994 APA convention,
Schmidt (in press) reached the same conclusion.
Albert Bartz, a psychologist who has authored several
statistical texts, brought this issue to the attention of
the APA Board of Scientific Affairs in March of 1995. He
proposed that the Board appoint a Task Force to make
recommendations as to how to implement the phasing out of
statistical significance testing in course texts, journal
articles, etc. The Board was provided with copies of the
Cohen article, the Schmidt paper, and other materials.
At its November 3-5 meeting the Board took up Dr.
Bartz' proposal. Board member Duncan Luce took the lead in
laying out this issue for the Board. The Board approved
in principle the notion of a Task Force to study this
question and make recommendations. The Board also felt that
the question was larger than APA; they felt that APS,
Division 5, the Society for Mathematical Psychology, and
other organizations should be given the opportunity to be
involved. They also felt they should at least check out the
potential involvement of other disciplines, such as
statistics (through the American Statistical Association).
The Board plans to bring this question up at a meeting of
the Federation of Behavioral and Social Sciences (which
includes Anthropology, Sociology, Economics, and other
social sciences).
The Board appointed a committee of its members to study
this question and make recommendations to be acted on at its
March 1996 meeting, specifically:
1. What the plan for the Task Force should be.
2. What the budget for the Task Force should be.
3. Who should be on the Task Force.
The subcommittee will talk to a variety of people outside
and inside APA before making its recommendations. The chair
of the subcommittee is Duncan Luce.
Suzanne Wandersman, APA staffer to the Board, reported
that the Board appeared to be very favorable to the idea of
doing away with statistical significance testing. She
thinks this effort will go forward and that there will be
quite a bit of activity on it in 1996.
Cohen, J. (1994). The earth is round (p < .05). American
Psychologist, 49, 997-1003.
Schmidt, F.L. (in press). Statistical significance testing
and cumulative knowledge psychology: Implications for
the training of researchers. Psychological Methods.