East Carolina University
Department of Psychology
Familywise Alpha = The Conditional Probability that One or More of the Boogey Men Sleeping Under Your Bed Will Get You
Familywise alpha is the conditional probability of rejecting one or more absolutely true null hypotheses in a family of several absolutely true null hypotheses. Rejecting an absolutely true null hypothesis is known as a "Type One Error." It is important to keep in mind that one cannot make a Type I error unless one tests an absolutely true null hypothesis. Accordingly, if absolutely true null hypotheses are unlikely to be encountered, then the unconditional probability of making a Type I error will be quite small.
Psychologists and some others act as if they think they will burn in hell for an eternity if they ever make even a single Type I error -- that is, if they ever reject a null hypothesis when, in fact, that hypothesis is absolutely true. I and many others are of the opinion that the unconditional probability of making a Type I error is close to zero, since it is highly unlikely that one will ever test a null hypothesis that is absolutely true. Why worry so much about making an error that is almost impossible to make?
There exists a variety of techniques for capping familywise alpha at some value, usually .05. Why .05? Maybe .05 is, sometimes, a reasonable criterion for statistical significance when making a single comparison, but is it really reasonable to cap familywise alpha at .05? Even if it is, what reasonably constitutes the family for which one should cap familywise alpha at .05? Is it the family of hypotheses that
Many times I have asked this question about what reasonably constitutes a family of comparisons for which alpha should be capped at .05. I have never been satisfied with any answer I have received.
Controlling Familywise Alpha When Making Multiple Comparisons Among Means
The context in which the term "familywise alpha" is most likely to arise is when making multiple comparisons among means or groups of means. Suppose one has four means and wishes to compare each mean with each other mean. That is six comparisons. If all four means were absolutely equal in the populations of interest, that would be six absolutely true null hypotheses being tested. Those obsessed with familywise alpha are likely to use a technique like Tukey or Bonferroni or Scheffe to cap the familywise alpha when making those comparisons. Curiously, those same people do not apply any such correction when conducting seven F tests in a three-way factorial ANOVA. Why? Again, I have asked this question many times and never received a decent answer.
Sedlmeier and Gigerenzer (1989) studied the power of articles published in the Journal of Abnormal Psychology in 1984. Cohen (1962) did the same for articles published in 1960 in the Journal of Abnormal and Social Psychology. Cohen found that the median power was .17 for small effects, .46 for medium effects, and .89 for large effects. If we can generalize this to psychological research in general, this means that a psychologist looking for an effect that exists and is medium in size is more likely to make a Type II error than a correct decision. Sedlmeier and Gigerenzer found that power twenty-four years later was even lower: .12 for small effects, .37 for medium effects, and .86 for large effects. Why did things actually get worse? Sedlmeier and Gigerenzer showed that the primary reason for the drop in power was the use of alpha-adjustment procedures (such as Tukey, Newman-Keuls, Bonferroni, and so on). They also pointed out that the ratio of beta to alpha indicates that psychological researchers seem to think that making a Type I error is 11 to 14 times more serious than making a Type II error. Of course, that assumes that psychological researchers actually think about the relative seriousness of Type I and Type II errors and chose their alpha and their sample size with that in mind, which is, in my experience, very rarely the case.
Controlling Familywise Alpha When Conducting Factorial ANOVA and/or Tests of Simple Effects
Suppose you are conducting a 2 x 2 factorial ANOVA. There will be three F tests in the omnibus analysis. If the interaction is significant, you are likely conduct two tests of simple effects (or, if you want to look at the interaction from both possible perspectives, four tests of simple effects. That is, you will be conducting from three to seven tests of null hypotheses. Should you apply an adjustment of alpha to cap familywise error across this family of tests, as many people do when making pairwise comparisons between means? Whether you should or not, the plain truth is that nobody does so and very few people ever even talk about it. If you look at any good stats text that covers factorial ANOVA (might as well look at the best, David Howell's Statistics for Psychology), you will see that no alpha-adjustment is made in this circumstance.
MANOVA and Familywise Error
The term "familywise alpha" sometimes comes up when discussing MANOVA. In MANOVA one has two or more continuous outcome variables and one or more categorical predictor variables. One could just do an ANOVA with each outcome variable, but would not that inflate the familywise alpha across outcome variables? Some believe that it is wise to conduct a MANOVA first and then, if and only if the MANOVA is significant, to conduct the univariate ANOVAs. Some believe that this procedure somehow protects one from making Type I errors. Others think this is idiocy and the only good reason to do a MANOVA is to find the weighted linear combination(s) of the outcome variables that maximize the effects of the categorical predictor variables.
I (tongue-in-cheek) and others have suggested that those obsessive about Type I errors would be better protected (compared to the MANOVA protected test described above) if they were just to dispense with the MANOVA and go right to the univariate ANOVAs after adjusting the criterion of statistical significance downwards with a Bonferroni adjustment. Doing so, however, has a great cost -- reduced power.
All of the devices for reducing familywise alpha (excepting Fisher's procedure) do so by adjusting downward the per comparison alpha -- which, of course, reduces power and makes a Type II error more likely, sometimes much more likely. The usual null hypothesis is that two variables are absolutely unrelated to each other. Usually we are looking for evidence that they are related. Think of the test of the null hypothesis as an attempt to find something (the relationship between two variables) that almost certainly is there, but might be small. The more power you have, the better your chances of finding the thing that is there. A Type I error is finding something that is not there. A Type II error is failing to find something that is there.
Now think of me looking for something on my desk. It is a screw that fell out of my eyeglasses. My vision is 20/400 without my glasses on, so I really need to find that screw -- it is small but very important. My wife comes in to help me look for it. She is wearing her eyeglasses and her vision without them is similar to mine. I ask her to remove her eyeglasses before looking for the screw because I am afraid that with her eyeglasses on she might detect something that is not a screw after all, but rather is just a smudge on a lens -- that is, I am afraid that she will make a Type I error. What, you think I am silly, you say there is almost no chance that she will find the screw without her glasses -- that is, she will have little power and will almost certainly make a Type II error? Well, applying a Bonferroni or similar correction when testing hypotheses is not much different than taking off your eyeglasses when you are looking for something that is almost certainly there but might not be very large.
I wish to acknowledge a graduate student at the Ontario Institute for Studies in Education, who pointed out to me that a previous edition of my document on one-way MANOVA gave the impression that I think it good practice to apply a Bonferroni correction when conducting multiple univariate analyses of variance. Norman reminded me that such a correction will greatly reduce power and he also asked the critical question, "exactly what is the family of comparisons for which one should cap familywise alpha at .05 or some other desirable value?"
Norman also listed several articles in which others have argued against the practice of using MANOVA as a device to protect against making Type I errors when conducting multiple univariate ANOVAs (the first of which has been on the reading list for my classes for quite some time now):
Huberty, C. J., & Morris, J. D. (1989). Multivariate analysis versus multiple univariate analyses. Psychological Bulletin, 105, 302-308.
Huberty, C. J., & Petroskey,
M. D. (2000). Multivariate analysis of variance and covariance.
In H. E. A. Tinsley & S. D. Brown (Eds.)
Handbook of applied multivariate statistics and mathematical modeling.
Academic Press, pages 183-208.
Jaccard, J. & Guilamo-Ramos,
V. (2002). Analysis of variance frameworks in clinical child and
adolescent
psychology: Issues and recommendations. Journal of Clinical Child
and Adolescent Psychology, 2002, 31, 130-146.
Weinfurt, K. P. (1995). Multivariate analysis of variance. In L. G. Grimm and P. R. Yarnold (Eds.) Reading and understanding multivariate statistics. American Psychological Association, Washington, DC, pages 245-276.
References
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309-316.
fMRI Gets Slap in the Face with a Dead Fish -- OK, sometimes familywise error may be a serious problem, but the solution is still poor in that it will create many Type II errors.
Contact Information for the Webmaster,
Dr. Karl L. Wuensch
This page most recently revised on the 18^{th} of August, 2014.