PSYC 6430 Readings Associated with Chapter 7 (Inferences about Means) in Howell
You should be prepared to answer, on the final exam in PSYC 6430, questions relating to the readings listed below. Below the listings I have a few comments which you should review. If you have any questions about these readings, please ask me, electronically or in class.
Robustness and the Psychology of Publication
1. Bradley, J. V. (1982). The insidious L-shaped distribution. Bulletin of the Psychonomic Society, 20, 85-88.
2. Bradley, J. V. (1984). The complexity of nonrobustness effects. Bulletin of the Psychonomic Society, 22, 250-253.
3. Bradley, J. V. (1981). Overconfidence in ignorant experts. Bulletin of the Psychonomic Society, 17, 82-84.
4. Bradley, J. V. (1981). Pernicious publication practices. Bulletin of the Psychonomic Society, 18, 31-34.
5. Bradley, J. V. (1982). Editorial overkill. Bulletin of the Psychonomic Society, 19, 271-274.
6. Bradley, J. V. (1984). Antinonrobustness: A case study in the sociology of science. Bulletin of the Psychonomic Society, 22, 463-466.
7. Black, S., & Wuensch, K. L. (2003). Two Case Studies in the Ethics of Scientific Publication.
Meta Analysis and Effect Size
1. Eagly, A. H. (1987). Reporting sex differences. American Psychologist, 42, 756-757.
2. Hyde, J. S. (1981). How large are cognitive gender differences? American Psychologist, 36, 892-901.
3. Rosenthal, R. (1990). How are we doing in soft psychology? American Psychologist, 45, 775-777.
4. Rosenthal, R., & Rubin, D. B. (1982). Further meta-analytic procedures for assessing cognitive gender differences. Journal of Educational Psychology, 74, 708-712.
Bradley's articles detail the results of his Monte Carlo work on the robustness of t and F with respect to their normality assumption and the difficulties he had getting this work published. It seems that psychologists just do not want to know that they have been using t and F in situations where they should not. One such situation is when the data are badly skewed, as is often the case with latency and reaction time data. Having discovered how poorly our editorial system sometimes works in academe, Bradley published a series of article attacking that system. The Bulletin of the Psychonomic Society, by the way, is not refereed -- they publish anything submitted by a member, with a check, as long as it meets style and format requirements. John Garcia had a similar experience when he tried to publish his early work on taste aversion learning -- he could not get them published in a mainstream journal and had to resort to publishing in Psychnomic Science (now know as the Bulletin of the Psychonomic Society). His research cast serious doubt on the validity of the learning theories popular at the time, and psychologists just did not want to see results that did not match their expectation. In fact, one critic, when referring to Garcia's work, said "These results are as likely as birdshit in a cuckoo clock" (see Seligman, Martin E.P., and Joanne L. Hager, Biological Boundaries of Learning, New York: Appleton-Century-Crofts, 1972). Well, Garcia's work has now been replicated hundreds of times and has caused a major revision in our theories about learning.
The contributions of Black and Wuensch show further how human an enterprise scholarly publishing is.
In a meta-analysis the researcher uses the results of previous empirical studies to estimate the size of an effect of interest. Janet Hyde's classic meta-analysis on the size of gender differences in cognitive abilities is a good example of a meta-analysis. It is also a good example how one's philosophical stance can influence one's interpretation of empirical results. Hyde reported that women score about 1/4 standard deviation higher than men on tests of verbal ability, men score about 2/5 standard deviation above women on tests of quantitative ability and visual-spatial ability, and men score about 1/2 standard deviations above women on tests of field articulation (a visual-analytic spatial ability ). In her article, Hyde notes Cohen's guidelines regarding standardized differences between means: 1/5 is small but not trivial, 1/2 is medium-sized, and 4/5 is large. So, the differences she found ranged from small (but not trivial) to medium in size. She also points out that the studies included in the meta-analysis were often of groups of people in which gender differences in cognitive abilities might be expected to be smaller than in the general population -- for example, Project Talent data (from 7th graders who score above 700 on the verbal or math SAT, very gifted persons) and college students (Penn State, Radcliffe, Harvard). She also explains how even small differences in the means of two populations can result in great differences in their tails. As an example of this, note that if you look to see who it is taking remedial reading classes in our schools, you will find three boys for every girl. If you look to see who it is that scores 700 or more on the Math SAT given to Talent Search seventh graders you will find 17 boys for every girl. These are certainly not small differences -- but in the abstract of her article she describes the differences that she found as "very small" -- and the majority of persons who cite her work describe the differences as very small.
Rosenthal's articles show how the Binomial Effect Size Display can illustrate the magnitude of differences between two groups. One of his examples was the Physicians' Aspirin Study, in which the study was stopped prematurely because the early returns showed that the benefits of taking aspirin were so great that it would not be ethical to continue having those in the placebo group not take aspirin. So, how large was this effect? In terms of percentage of variance explained, it was about one tenth of one percent. I prefer odds ratios to the BESD for illustrating the size of such an effect. The odds of having a heart attack was 189/10,845 among those taking the placebo and 104/10993 among those taking aspirin, yielding an odds ratio of 1.84. That is, the odds of having a heart attack were 1.84 times higher among those taking a placebo than among those taking aspirin.
Here is another example of the use of Rosenthal's Binomial Effect Size Display. An ETS study by Linn (American Psychologist, 37, 283) reported a multiple R of 0.41 for predicting college grades from verbal and math SAT scores. The BESD "treatment success" proportion of high-SAT students who earn high college grades is .5 + .41/2 = .705 or 70%. The "control success rate" is .5 - .41/2 = .295 of 30%. The BESD is:
| College Grades | ||
| SAT | Low | High |
| High | 30% | 70% |
| Low | 70% | 30% |
The R of .41 is equivalent to having a success rate (above average college grades) of 70% among students who scored above average on the SAT, but only 30% among students who scored below average on the SAT.
Eagly's article points out that percentage of variance accounted statistics can be misleading because they are influenced by the extent to which the researcher has been able to eliminate extraneous variables. When extraneous variables have been eliminated (held constant), the SStotal is reduced, increasing the r-squared or eta-squared, that is, SSeffect / SStotal. Hyde recommends using standardized difference between means instead, but fails to recognize that the same problem exists with such measures. When extraneous variables have been eliminated, the group standard deviations are reduced, and that too inflates the estimate of effect by reducing the denominator, (M1 - M2) / s .

Back to the
Readings for Students
in Graduate Statistics Page