Correlation and Causation

East Carolina University
Department of Psychology

When Does Correlation Imply Causation?

Here are selected contributions from a discussion, on the EDSTAT list, about the phrase "correlation does not imply causation:"

From: Wuensch, Karl L
Sent: Wednesday, December 05, 2001 7:36 AM
To: edstat-l@jse.stat.ncsu.edu
Subject: When does correlation imply causation?

I opined "correlation is necessary but not sufficient for establishing a causal relationship." Jim opined "depending on precisely what Karl means by "correlation is necessary," I'd have to disagree strongly.

More nearly precisely what I mean follows, but is long.

First, let me give a short answer to the question "When does correlation imply causation?" The short answer is: When the data from which the correlation was computed were obtained by experimental means with appropriate care to avoid confounding and other threats to the internal validity of the experiment.

My long answer will start with a distinction between correlation as a statistical technique and "correlational" (nonexperimental) designs as a way to gather data.

It is not rare for researchers and students to confuse (1) correlation as a statistical technique with (2) nonexperimental data collection methods, which are also often described as "correlational." For example, a doctoral candidate at Florida State University hired me to assist him with the statistical analysis of data collected for his dissertation. No variables were manipulated in his research. I used multiple regression (a path analysis) to test his causal model. When he presented this analysis to his dissertation committee the chair asked him to reanalyze the data with an ANOVA, explaining that results obtained with ANOVA would allow them to infer causality, but results obtained with multiple regression would not because "correlation does not imply causation." I cannot politely tell you what my initial response to this was. After I cooled down, and realizing that it would be fruitless to try to explain to this chair that ANOVA is simply a multiple regression with dummy coded predictors, I suggested that the student present the same analysis but describe it as a " hierarchical least squares ANOVA." The analysis was accepted under this name and the chair felt that she then had the appropriate tool with which to make causal inferences from the data.

I have frequently encountered this delusion, the belief that it is the type of statistical analysis done, not the method by which the data are collected, which determines whether or not one can make causal inferences with confidence. Several times I have I had to explain to my colleagues that two-group t tests and ANOVA are just special cases of correlation/regression analysis. One was a senior colleague who taught statistics, research methods, and experimental psychology in a graduate program. When I demonstrated to him that a test of the null hypothesis that a point biserial correlation coefficient is zero is absolutely equivalent to an independent samples (pooled variances) two-groups t test, he was amazed.

The hypothetical example that I give my students is this: Imagine that we go downtown and ask people to take a reaction time test and to blow into a device that measures alcohol in the breath. We correlate these two measures and reject the hypothesis of independence. Can we conclude, from this evidence, that drinking alcohol causes increased reaction time? Of course not. There are all sorts of potential noncausal explanations of the observed correlation. Perhaps some constellation of "third variables" is causing variance in both reaction time and alcohol consumption -- for example, perhaps certain brain defects both (1) slow reaction time and (2) dispose people to consume alcohol. Suppose we take these same data and dichotomize the alcohol measure. We employ an independent samples t test to compare the mean reaction time of those who have consumed alcohol with that of those who have not. We find the mean of those who have consumed alcohol to be significantly higher. Does that now allow us to infer that drinking alcohol causes increases in reaction time. Of course not -- the same potential noncausal explanations that prevented such inference with the correlational analysis also prevent such inference with the two-group t test conducted on data collected in nonexperimental means.

Now consider that we bring our research into the lab. We employ experimental means -- we randomly assign some folks to an alcohol consumption group, others to a placebo group, taking care to avoid any procedural or other confounds. When a two-groups t test shows that those in the alcohol group have significantly higher reaction time than those in the placebo group, we are confident that we have results that allow us to infer that drinking alcohol causes slowed reaction time. If we had conducted the analysis by computing the point biserial correlation coefficient and testing its deviation from zero, we should be no less confident of our causal inference, and, of course, the value of t (or F) and p obtained by these two seemingly different analyses would be identical.

Accordingly, I argue that correlation is a necessary but not a sufficient condition to make causal inferences with reasonable confidence. Also necessary is an appropriate method of data collection. To make such causal inferences one must gather the data by experimental means, controlling extraneous variables which might confound the results. Having gathered the data in this fashion, if one can establish that the experimentally manipulated variable is correlated with the dependent variable (and that correlation does not need to be linear), then one should be (somewhat) comfortable in making a causal inference. That is, when the data have been gathered by experimental means and confounds have been eliminated, correlation does imply causation.

So why is it that many persons believe that one can make causal inferences with confidence from the results of two-group t tests and ANOVA but not with the results of correlation/regression techniques. I believe that this delusion stems from the fact that experimental research typically involves a small number of experimental treatments and that data from such research are conveniently evaluated with two-group t tests and ANOVA. Accordingly, t tests and ANOVA are covered when students are learning about experimental research. Students then confuse the statistical technique with the experimental method. I also feel that the use of the term "correlational design" contributes to the problem. When students are taught to use the term "correlational design" to describe nonexperimental methods of collecting data, and cautioned regarding the problems associated with inferring causality from such data, the students mistake correlational statistical techniques with "correlational" data collection methods. I refuse to use the word "correlational" when describing a design. I much prefer "nonexperimental" or "observational."

In closing, let me be a bit picky about the meaning of the word "imply." Today this word is used most often to mean "to hint" or "to suggest" rather than "to have as a necessary part." Accordingly, I argue that correlation does imply (hint at) causation, even when the correlation is observed in data not collected by experimental means. Of course, with nonexperimental research, the potential causal explanations of the observed correlation between X and Y must include models that involve additional variables and which differ with respect to which events are causes and which effects.

Karl L. Wuensch, Department of Psychology,
East Carolina University, Greenville NC 27858-4353

From: Art Kendall [Arthur.Kendall@verizon.net]

I concur. Another way to put it is:

The results of statistical analyses are parts of principled arguments about causality.

If correlation (in the broad sense) remains after taking into account (controlling, rendering unlikely) plausible rival hypotheses, it does imply (support, suggest, indicate, make plausible) causation.

In experimental studies, active manipulation of independent variables, and random assignment to conditions, go a long way toward minimizing the plausability of rival hypotheses. If there is a correlation between treatment and score on a dependent variable (i.e,. if there is a difference among treatment groups) after rejecting consistency with merely random process, a causal relation hypothesis is supported.

In quasi-experimental designs, case selection, partialling, controlling, etc, are needed to support causal argumentation.

From: Michael M. Granaas

I have noticed the same tendency to confuse the correlation coefficient with observational data collection methods. I explain to my students that the confusion comes from exactly the types of things that Karl describes. Historically regression has been taught in the context of observational research and ANOVA in the context of experimental research. (Kirk's book about ANOVA designs is even entitled "Experimental Design".) This simplifies things for the learner, but can lead them down exactly the wrong path when it comes to analyzing/interpreting their data later in life.

We really need to emphasize over and over that it is the manner in which you collect the data and not the statistical technique that allows one to make causal inferences.

Michael M. Granaas
Associate Professor mgranaas@usd.edu
Department of Psychology
University of South Dakota
Vermillion, SD 57069

From: Dennis Roberts [dmr@psu.edu]

correlation NEVER implies causation ...

the problem with this is ... does higher correlation mean MORE cause? lower r mean LESS cause?

From: Karl W.

Dennis is not going to like this, since he has already expressed a disdain of r², omega-square, and eta-square like measures of the strength of effect of one variable on another, but here is my brief reply:

R² tells us to what extent we have been able to eliminate, in our data collection procedures, the contribution of other factors which influence the dependent variable.

Excellent essay. I agree completely about the confusion of method (anova vs correlation) with the nature of the data gathering process.

Neil W. Henry, Department of Sociology and Anthropology
Department of Statistical Sciences and Operations Research, Box 843083
Virginia Commonwealth University, Richmond VA 23284-3083

I appreciated your comments on correlation/causation. I teach stats & research design in the College of Ed at the U of Ky, and I hammer my students all the time with research scenarios, asking, "What kind of research is this? What kinds of conclusions can you draw about this kind of research?"

Last summer I created a webpage for my students, and I continue to add to it. May I post your comments on my page and put a link to your website there? Here's a link to the page: http://www.uky.edu/~ldesh2/stats.htm

Cheers, Lise Deshea

From: Karl W.

My experimental units are 100 classrooms on campus. As I walk into each room I flip a perfectly fair coin in a perfectly fair way to determine whether I turn the room lights on (X = 1) or off (X = 0). I then determine whether or not I can read the fine print on my bottle of smart pills (Y = 0 for no, Y = 1 for yes). From the resulting pairs of scores (one for each classroom), I compute the phi coefficient (which is a Pearson r computed with dichotomous data). Phi = .5. I test and reject the null hypothesis that phi is zero in the population (using chi-square as the test statistic). Does correlation (phi is not equal to zero) imply causation in this case? That is, can I conclude that turning the lights on affects my ability to read fine print?

I modify my experiment such that Y is now the reading on an instrument that measure the intensity of light in the classroom. I correlate X with Y (point biserial r, a Pearson r between a dichotomous and a continuous variable) and obtain r = .5. I test and reject the null that this r is zero in the population (using t or F as the test statistic). Does correlation (point biserial r is not zero) imply causation in this case? That is, can I conclude that one of things I can do to increase the intensity of light in the room is to turn on the lights?

I modify this second experiment by creating three experimental groups, with classrooms randomly assigned to groups. In one group I turn off the lights and close the blinds. In a second group I raise the blinds but turn off the lights. In a third group I raise the blinds and turn on the lights. I compute eta, the nonlinear correlation coefficient relating group membership to brightness of light in the room. Alternatively I dummy code group membership and conduct a multiple regression predicting brightness from my dummy variables. R = eta = .5. I test and reject the null hypothesis that R and eta are zero in the population (using F as my test statistic). Does correlation (R or eta are not equal to zero) imply causation in this case?

I could continue on with other correlations appropriate for various experimental designs, but I would hope that you have gotten the point by now

From: Stephen Levine [mailto:szlevine@netvision.net.il]
Sent: Tuesday, December 11, 2001 3:47 AM
To: Karl L. Wuensch
Subject: Re: When does correlation imply causation?

Hi
You wrote
Several times I have I had to explain to my colleagues that two-group t tests and ANOVA are just special cases of correlation/regression analysis.
I can see what you mean - could you please proof it - I read, in a pretty good text, that the results are not necessarily the same!

Reply from: Wuensch, Karl L
Subject: ANOVA = Regression

For a demonstration of the equivalence of regression and traditional ANOVA, just point your browser to T Tests, ANOVA, and Regression Analysis

I found a nicely written article on this topic in the Teaching of Psychology. Here is the reference:

Hatfield, J., Faunce, G. J., & Soames Job, R. F. (2006). Avoiding confusion surrounding the phrase "correlation does not imply causation." Teaching of Psychology, 33, 49-51.

Does Lack of Correlation Imply Lack of Causation? If we can convincingly show that X and Y have zero correlation, does that rule out the possibility that X has a causal effect on Y. Well, perhaps for practical purposes, but it might be that the effect of X on Y is moderated by Z such that the effect is positive in some cases, negative in others, balancing out to zero in the aggregated data. Also, it is possible that the effect of X on Y is mediated by variables M1 and M2, with the effect of M1 being positive and M2 being negative, balancing out to zero for the zero-order correlation.

Back to the Stat Help Page

Contact Information for the Webmaster,
Dr. Karl L. Wuensch

This page most recently revised on the 5^th of January, 2015.