East Carolina University
Department of Psychology
Review of Article on Use of Exploratory Factor Analysis
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.
I recommend this article to those who are just learning about exploratory factor analysis as well as to those who have used it in their research for many years. The authors discuss several decisions that the factor analyst will need to make when doing a factor analysis and present the results of reanalysis of three data sets from the literature, illustrating the pitfalls associated with making incorrect decisions.
Selecting Variables/Items for the Analysis. Ideally the researcher will select items which are reliable and will have good communalities. Include enough variables so that each common factor will be represented by at least three or four variables. See the work of Velicer and Fava (1998) on this topic, which is summarized near the end of my document Factor Analysis.
Selecting Subjects for the Analysis. See the earlier work of MacCallum, Widaman, Zhang, & Hong (1999) regarding recommend number of subjects (summary available near the end of my document Factor Analysis). Don't make the mistake of sampling from a population of subjects for which their is little variance in the factors you wish to estimate. You might even want to sample in such a way that your subjects will vary exceptionally much with respect to the factors you wish to estimate but little on other attributes.
Principal Components Analysis or Factor Analysis? If your purpose is to reduce the information in many variables into a set of weighted linear combinations of those variables, use Principal Components Analysis (PCA), which does not differentiate between common and unique variance. If your purpose is to identify the latent variables which are contributing to the common variance in a set of measured variables, use Factor Analysis (FA), which will attempt to exclude unique variance from the analysis.
Exploratory or Confirmatory Factor Analysis? If you wish to restrict the number of factors extracted to a particular number and specify particular patterns of relationship between measured variables and common factors, and this is done a priori (before seeing the data), then the confirmatory procedure is for you. If you have no such well specified a priori restrictions, then use the exploratory procedure.
Which Factor Extraction Procedure? Maximum Likelihood (ML) extraction allows computation of assorted indices of goodness-of-fit (of data to the model) and the testing of the significance of loadings and correlations between factors, but requires the assumption of multivariate normality. Principal Factors (PF) methods have no distributional assumptions. The authors favor ML extraction. They suggest that one first examine the distributions of the measured variables for normality. Unless there are severe problems ( |skew| > 2, kurtosis > 7), they say go with ML. If there are severe problems, consider trying to correct the problems (by transforming variables, for example) rather than using PF methods.
How Many Factors to Extract? Prefer overfactoring (too many factors) to underfactoring (too few factors). Overfactoring is likely to lead to a solution where the major factors are well estimated by the obtained loadings but where there are also additional poorly defined factors (with few, if any, variables loading well on them). Underfactoring is likely to lead to factors that are poorly estimated (poor correspondence between the structure of the true factors and that of the estimated factors), a more serious problem.
The authors are not very fond of the Kaiser "eigenvalue greater than 1" rule nor Cattell's scree test. With respect to the former, they note that it was intended to be applied to the eigenvalues of the full correlation matrix (that with 1's in the main diagonal), not to the eigenvalues of the reduced correlation matrix (that with estimates of communalities in that diagonal).
The authors spoke kindly of "parallel analysis," in which the obtained eigenvalues are compared to those one would expect to obtain from random data. If the first m eigenvalues are those which have values greater than what would be expected from random data, then one adopts a solution with m factors. Regretfully, this method is not available in the major statistical programs.
The goodness-of-fit statistics available from ML factor analysis may be helpful in determining the number of factors to retain. The analyst first decides how many factors, at most, e would be willing to retain. Then e fits models with 0, 1, 2, 3, ... up to that number of factors and compares them with respect to goodness-of-fit.
The authors also note that "a model that fails to produce a rotated solution that is interpretable and theoretically sensible has little value." This sounds like what I call the "meaningfulness criterion." I typically examine, in addition to the solution with what seems at first to have the correct number of factors, solutions with one or two more or fewer factors. I then adopt the solution which makes the most sense to me.
What Type of Rotation? The authors make a strong argument in favor of oblique rotations rather than orthogonal solutions. They note that dimensions of interest to psychologists are not often dimensions we would expect to be orthogonal. If the latent variables are, in fact, correlated, then an oblique rotation will produce a better estimate of the true factors and a better simple structure than will an orthogonal rotation -- and if the oblique rotation indicates that the factors have close to zero correlations between one another, then the analyst can go ahead and conduct an orthogonal rotation (which should then give about the same solution as the oblique rotation).
What Do Researchers Actually Do? Based on articles published between 1991 and 1995 in the Journal of Personality and Social Psychology and the Journal of Applied Psychology, about half use a PCA, despite the fact that the primary goal was to identify latent variables, in which case FA should have been employed. They do often report the reliabilities of their variables, but not the communalities (which are more informative). Frequently they do not explain the method they used to decide how many factors to retain, and when they do report the method it is most likely to be the eigenvalue-greater-than-one method They use varimax rotation. When asked to provide a copy of their data so that Fabrigar et al. could determine if a better solution would be obtained by making decisions other than those made by the researchers, most researchers failed to provide the data. For those that did provide the data, Fabrigar et al. found that an oblique rotation often produced a slightly better simple structure than did a varimax rotation, but the pattern of loadings was almost always the same with varimax as with oblique rotation.
Why do Researchers Make These Decisions? That is, why do they elect to do a PCA, retain as many factors as have eigenvalues greater than 1, and use varimax rotation? Well, maybe it is just because these are the defaults for factor analysis in SPSS. You know, one does not have to understand anything about factor analysis to be able to point and click.
How Can We Prevent Researchers From Making These Bad Decisions? The authors suggest that methodologists, in addition to publishing highly technical papers in journals seen only by other methodologists, need to publish less technical papers in the journals that researchers read, and editors must be willing to publish those articles. Regretfully, the editors of the journals that nonmethodologists read have not, in my experience, been very receptive to publishing such articles -- see Frequency of Type I Errors in Professional Journals for one example.
Contact Information for the Webmaster,
Dr. Karl L. Wuensch
This page most recently revised on 21. June 2006