Checking Data For Compliance with A Normality Assumption

Some researchers use statistical tests of normality (such as the Kolgomorov Smirnov test).  The null hypothesis is that the population is normally distributed.  It that hypothesis is not rejected, then the researcher conclude that it is OK to use the sample data with procedures that assume normality, most often tests of means where the test statistic is Student's t or F.  This is very poor practice, IMHO.  When sample sizes are small, the t or F statistics will not be very robust to violation of the normality assumption, but at the same time the small sample sizes will result in the test of normality having so little power that it is likely not to detect serious deviations from normality.  Likewise, when the sample size is large, the t or F statistics will be more robust to the normality assumption, but at the same time the test of normality will have so much power that it will detect as "significant" deviations from normality that are too small to be of any concern.  The same criticisms can be made regarding the use of statistical tests of homogeneity, such as Levene's test, when employed to help decide whether to use a test of means that assumes homogeneity of variance or one that does not.

In April of 2007 I posed the following question of members of the EDSTAT-L community:

For those of you who use procedures which assume that a variable is normally distributed, at what point do you decide that the shape of the data in your sample is so skewed that you consider transforming the data or using an alternative procedure which has no assumption of normality? Is the g1 statistic of much use in making such judgments? If so, how far does g1 need be from zero before you become uncomfortable with the normality assumption? Are there any "rules of thumb" here that can be well defended?

Dale Berger responded:

One can use measures of skew and kurtosis as 'red flags' that invite a closer look at the distributions. A rule of thumb that I've seen is to be concerned if skew is farther from zero than 1 in either direction or kurtosis greater than +1. However, there is no substitute for taking a close look at the distribution in any case and trying to understand what skew and kurtosis are telling you. It may be that there is simply an error such as a miscoded number. Also, one can have skew of zero for pretty strange distributions where trimming or some other transformation may be useful, depending on the analysis.

A serious problem with using the tests of statistical significance for skew and kurtosis is that these tests are more sensitive with larger samples where modest departures from normality are less influential. With a small sample an outlier can be very influential even though the departure of skew and kurtosis from zero may not attain statistical significance.

Best advice: Look at a plot of your data. Don't rely on summary statistics alone.

What I have advised my students, in a nutshell, is:

1. For each variable ask your stat software to show you the minimum score, the maximum score, g1, g2, a stem and leaf plot, and a box and whiskers plot. High values of g2 and high absolute values of g1 may signal outliers that deserve your attention.

2. Check for any out-of-range values. If there are any, investigate them. Of course you will need to have had the foresight of tagging data records with identification numbers. Correct errors or set out-of-range values to missing.

3. Also investigate outliers and, if justified, recode them or discard them.

4. Once you feel you have corrected any data errors, recompute g1 and g2 if you expect to be including the variable in an analysis that assumes the sample came from a normally distributed population. As Dale noted, do pay special attention to plots too -- g1 and g2 do not tell the whole story.

5. If the absolute value of g1  is close to or  > 1, consider transforming the variable to reduce skewness. Also consider analyzing the skewed variable both with and without the transformation and feel awkward if the transformation distinctly alters the results of that analysis. No, changing the  p value from .049 to .051 or vice versa is not a big change, unless you or your audience cannot detect shades of gray.

6. Also consider using a procedure which does not assume that population distributions have any particular shape.

7. Do not use tests of significance to determine whether or not you need to do something about the skewed variable, for the reasons Dale pointed out.  In a similar fashion, do not use tests of significance to determine whether or not you need to do something about that heterogeneity of variance you just discovered.

Dennis Roberts responded:

One problem here is n ... if a sample is rather small ... than rather wild aberrations in the sample data still could possibly have come from a normal distribution in the population ... that is much less likely to be the case if n is large.

Michael Granaas responded:

Here's my rule of thumb on normality:

1. Plot a histogram of the residuals with or without a normal distribution overlay.

2. Hold the plot at arms distance and blur your eyes just a bit (if near sighted you can take off your glasses).

3. If the histogram is obviously non-normal under these circumstances then you should be cautious.

This is consistent with most texts that report parametric statistics are fairly robust if the distribution of residuals is unimodal, symmetric, with moderate to small variance. (I confess that I have never checked the primary sources myself, but the claim seems sensible to me so I've never felt moved to challenge it.)

I hesitate to use statistical measures of non-normality because the few that I have tried (skewness and kurtosis) seem to be overly sensitive. However I would be delighted if there were a viable statistic with a defensible cut-off score...it would simplify my teaching considerably

A test or other statistical procedure is robust against a particular assumption if the conclusion from it is valid even if that assumption isn't met. Most of what we know about robustness comes from higher order asymptotics and simulation. My favorite reference on this is the book "Beyond ANOVA" by Rupert Miller. (It's a classic.) Fred Ramsey and I have tried to supply practically useful advice about robustness and dealing with assumptions in our book "The Statistical Sleuth--a Second Course in Data Analysis" (in Ch. 3 and Ch. 5).

Not all tests based on normality are equally robust, so one can't make a single statement to cover all situations. As other have mentioned, the two-sample t-test *is* generally quite robust against departures from normality, even for fairly small sample sizes. The F-test for equality of two variances is notoriously non-robust.

In answering Karl's question--about the point at which the normality-based test should be abandoned--it's important to also be aware of "robustness of efficiency." If, for example, we wish to compare two skewed distributions and we have determined that the two-sample t-test is robust (in validity) enough for this comparison, there may still be another procedure that makes more efficient use of the (non-normal) data, such as the rank-sum test or the two-sample t-test after log transformation. So we might abandon the normality-based test in this situation, even though it would provide a valid conclusion.

[What Dan is saying in this last paragraph that even if normality-assuming procedure is robust to nonnormality (in terms of keeping alpha near its nominal level), an alternative procedure may well have more power].

ANOVA and t tests.

“In general, if the populations can be assumed to be symmetric, or at least similar in shape (e.g., all negatively skewed), and if the largest variance is no more than four times the smallest, the analysis of variance is most likely to be valid. It is important to note, however, that heterogeneity of variance and unequal sample sizes do not mix. If you have reason to anticipate unequal variances, make every effort to keep sample sizes as <nearly> equal as possible. This is a serious issue and people tend to forget that noticeably unequal sample sizes make the test appreciably less robust to heterogeneity of variance.” (Howell, D. C. (2013). Statistical methods for psychology, 8th Ed. Belmont, CA: Wadsworth).

Diversity in Rules of Thumb

This discussion at Research Gate illustrates the diversity in opinion regarding rules of thumb regarding how far from zero g1 or g2 need be before needing to transform the data for parametric analysis.  It also illustrates the ignorance of several of the folks posting there -- for example, confusing the value of g1 with the ratio of g1 to its standard error.   Several there asserted if -2 < g1 < +2 then the variable is not skewed enough to worry.  One can test the "statistical significance" of skewness with a z statistic which is the ratio of g1 to its standard error. That standard error has value SQRT(6/N). Referring the value of that z to the standard normal distribution, any absolute value which exceeds 1.96 (rounded to 2) is “statistically significant.” That however is useless information with respect to deciding whether or not the variable is too skewed to use in an analysis that assumes normality.  I have scores on an anxiety measure, for which g1 = .595. This level of skewness is of no concern, but if we compute z we get a value of 3.39. The value of z is so large because the N is so large. This relationship between sample size and results of a test of significance are the reason statisticians long ago abandoned the practice of using tests of significance to evaluate distributional assumptions. In a nutshell, our anxiety measure is “significantly” skewed, but the amount of skewness is too small to be of any practical importance.

From Immediate Post-concussion and Cognitive Testing: Ceiling Effects,Reliability, and Implications for Interpretation

Archives of Clinical Neuropsychology 00 (2020) 1–9

Skewness values exceeding |1.0| were set as the criterion for negative or positive skewness, indicating non-normality (Harlow,2014).

Harlow, L. (2014).The essence of multivariate thinking: Basic themes and methods(2nd ed.). New York: Routledge  Return to Dr. Wuensch's Statistical Resources Page.