East Carolina University

Department of Psychology

**Checking Data For Compliance
with A Normality Assumption**

Some researchers use statistical tests of normality (such as the Kolgomorov Smirnov test). The null hypothesis is that the population is normally distributed. It that hypothesis is not rejected, then the researcher conclude that it is OK to use the sample data with procedures that assume normality, most often tests of means where the test statistic is Student's t or F. This is very poor practice, IMHO. When sample sizes are small, the t or F statistics will not be very robust to violation of the normality assumption, but at the same time the small sample sizes will result in the test of normality having so little power that it is likely not to detect serious deviations from normality. Likewise, when the sample size is large, the t or F statistics will be more robust to the normality assumption, but at the same time the test of normality will have so much power that it will detect as "significant" deviations from normality that are too small to be of any concern. The same criticisms can be made regarding the use of statistical tests of homogeneity, such as Levene's test, when employed to help decide whether to use a test of means that assumes homogeneity of variance or one that does not.

In April of 2007 I posed the following question of members of the EDSTAT-L community:

For those of you who use procedures which assume that a
variable is normally distributed, at what point do you decide that the shape of
the data in your sample is so skewed that you consider transforming the data or
using an alternative procedure which has no assumption of normality? Is the *g _{1}*
statistic of much use in making such judgments? If so, how far does

Dale Berger responded:

One can use measures of skew and kurtosis as 'red flags' that invite a closer look at the distributions. A rule of thumb that I've seen is to be concerned if skew is farther from zero than 1 in either direction or kurtosis greater than +1. However, there is no substitute for taking a close look at the distribution in any case and trying to understand what skew and kurtosis are telling you. It may be that there is simply an error such as a miscoded number. Also, one can have skew of zero for pretty strange distributions where trimming or some other transformation may be useful, depending on the analysis.

**A serious problem with using the tests of statistical significance for
skew and kurtosis** is that these tests are more sensitive with larger samples
where modest departures from normality are less influential. With a small sample
an outlier can be very influential even though the departure of skew and
kurtosis from zero may not attain statistical significance.

Best advice:** Look at a plot of your data. Don't rely on summary statistics
alone**.

Karl Wuensch added, in response to Dale's comments:

What I have advised my students, in a nutshell, is:

1. For each variable **ask your stat software to show you the minimum score,
the maximum score, g_{1}, g_{2}, a stem and leaf
plot, and a box and whiskers plot. High values of g_{2} and high
absolute values of g_{1} may signal outliers that deserve your
attention.**

2. **Check for any out-of-range values**. If there are any, investigate
them. Of course you will need to have had the foresight of tagging data records
with identification numbers. Correct errors or set out-of-range values to
missing.

3. Also **investigate outliers** and, if justified, recode them or discard
them.

4. Once you feel you have corrected any data errors, recompute *g _{1}*
and

5. **If the absolute value of g_{1} is close to or > 1,
consider transforming the variable to reduce skewness**. Also

6. Also **consider using a procedure which does not assume that population
distributions have any particular shape**.

7. **Do not use tests of significance to determine whether or not you need
to do something about the skewed variable,** for the reasons Dale pointed
out. In a similar fashion, do not use tests of significance to determine
whether or not you need to do something about that heterogeneity of variance you
just discovered.

Dennis Roberts responded:

One problem here is *n* ... if a sample is rather small ... than rather
wild aberrations in the sample data still could possibly have come from a normal
distribution in the population ... that is much less likely to be the case if *
n* is large.

Michael Granaas responded:

Here's my rule of thumb on normality:

1. **Plot a histogram of the residuals with or without a normal distribution
overlay**.

2. Hold the plot at arms distance and blur your eyes just a bit (if near sighted you can take off your glasses).

3. If the histogram is obviously non-normal under these circumstances then you should be cautious.

This is consistent with most texts that report parametric statistics are fairly robust if the distribution of residuals is unimodal, symmetric, with moderate to small variance. (I confess that I have never checked the primary sources myself, but the claim seems sensible to me so I've never felt moved to challenge it.)

I hesitate to use statistical measures of non-normality because the few that I have tried (skewness and kurtosis) seem to be overly sensitive. However I would be delighted if there were a viable statistic with a defensible cut-off score...it would simplify my teaching considerably

Dan Schafer added:

A test or other statistical procedure is robust against a particular assumption if the conclusion from it is valid even if that assumption isn't met. Most of what we know about robustness comes from higher order asymptotics and simulation. My favorite reference on this is the book "Beyond ANOVA" by Rupert Miller. (It's a classic.) Fred Ramsey and I have tried to supply practically useful advice about robustness and dealing with assumptions in our book "The Statistical Sleuth--a Second Course in Data Analysis" (in Ch. 3 and Ch. 5).

**Not all tests based on normality are equally robust, so one can't make a
single statement to cover all situations**. As other have mentioned, the
two-sample t-test *is* generally quite robust against departures from normality,
even for fairly small sample sizes. The F-test for equality of two variances is
notoriously non-robust.

In answering Karl's question--about the point at which the normality-based test should be abandoned--it's important to also be aware of "robustness of efficiency." If, for example, we wish to compare two skewed distributions and we have determined that the two-sample t-test is robust (in validity) enough for this comparison, there may still be another procedure that makes more efficient use of the (non-normal) data, such as the rank-sum test or the two-sample t-test after log transformation. So we might abandon the normality-based test in this situation, even though it would provide a valid conclusion.

[What Dan is saying in this last paragraph that **even if
normality-assuming procedure is robust to nonnormality (in terms of keeping
alpha near its nominal level), an alternative procedure may well have more power**].

** ANOVA and t tests.**

“In general, if the populations can be assumed to be
symmetric, or at least similar in shape (e.g., all negatively skewed), and if
the largest variance is no more than four times the smallest, the analysis of
variance is most likely to be valid. It is important to note, however, that
heterogeneity of variance and unequal sample sizes do not mix. If you have
reason to anticipate unequal variances, make every effort to keep sample sizes
as <nearly> equal as possible. This is a serious issue and people tend to forget
that noticeably unequal sample sizes make the test appreciably less robust to
heterogeneity of variance.” (Howell, D. C. (2013). *Statistical methods for
psychology*, 8th Ed. Belmont, CA: Wadsworth).

**Diversity in Rules of Thumb**

This discussion at Research Gate illustrates the diversity in opinion
regarding rules of thumb regarding how far from zero *g _{1}* or

From
Immediate Post-concussion and Cognitive Testing: Ceiling Effects,Reliability,
and Implications for Interpretation

Archives of Clinical Neuropsychology 00 (2020) 1–9

Skewness values exceeding |1.0| were set as the criterion for negative or
positive skewness, indicating non-normality (Harlow,2014).

Harlow, L. (2014).The essence of multivariate thinking: Basic themes and
methods(2nd ed.). New York: Routledge

Return to Dr. Wuensch's Statistical Resources Page.

Contact Information for the Webmaster,

Dr. Karl L. Wuensch

This page most recently revised on 19-April-2021.