Controlling Type III Errors

    Leventhal and Huynh (1996)) have written an interesting article on what they call the "directional two-tailed test" of statistical significance. The traditional two-tailed test (which I refer to as "nondirectional") does not, they argue, allow directional decisions. That is, when the null hypothesis is rejected, one may conclude that the tested parameter does not have a value equal to that specified in the null hypothesis, but cannot then infer in which direction the actual value of the parameter differs from the null value. They note, however, that it is common practice, following rejection of a nondirectional null, to conclude that the direction of difference in the population is the same as what it is in the sample. This procedure is what they call a "directional two-tailed test." They also refer to it as a "three-choice test" (I prefer that language), in that the three hypotheses entertained are: parameter = null value, parameter < null value, and parameter > null value.

    The authors point out that with the three-choice test, one may make Type I errors (if one really could test an absolutely true null hypothesis), Type II errors, or Type III errors. Type III errors (Kaiser, 1960) involve incorrectly inferring the direction of the effect - for example, when the population value of the tested parameter is actually more than the null value, getting a sample value that is so much below the null value that you reject the null and conclude that the population value is also below the null value.  When testing nondirectional hypotheses, one can correctly reject the null and still make a Type III error.  In this case, the probability of making a Type III error is included in power.  When testing directional hypotheses, it is alpha.

    Leventhal and Huynh  suggest a revised definition of power: the conditional probability of rejecting the null hypothesis and correctly identifying the true direction of difference between the population value of the tested parameter and the null value.

    The authors then demonstrate that when conducting three-choice tests (which is the usual practice), power will be somewhat less using the revised definition than when using the traditional definition, and when deciding how many observations will be needed to detect an effect of specified size with specified probability, one will underestimate the number required when using the traditional definition. They also show how to adjust traditional power and sample size calculations to take into account the possibility of a Type III error.

    IMHO, the probability of a Type III error in most circumstances is so small that taking it into account when computing power or sample sizes is not really necessary. Nevertheless, I recommend this very well written article to students in my intermediate level statistics classes, as reading it should contribute nicely to their understanding of the logic of hypothesis testing. I was also pleased with the authors' concluding recommendation: When wishing to decide in what direction a tested parameter's value differs from a given value, the primary means of analysis should be a two-sided confidence interval (not a test of statistical significance).

    Serlin and Zumbo (2001) have argued that with infinite populations, the truth of a point null hypothesis has zero probability.  From this follows the conclusion that one can never make a Type I error, so one need not be concerned with controlling for that error.  They argue that one can, however, make a Type III error, so one should choose alpha to control that error.  They dispense with the usual computation of p values, and rely instead on confidence intervals.  The directional two-tailed test is conducted by computing a traditional confidence interval with 100(1-2α)% coverage.  For example, if you want to hold the probability of a Type III error to 5%, you use a 90% confidence interval.  Serline and Zumbo used Monte Carlo methods to investigate the error rate (Type III) of this procedure, using a nominal alpha of .05.  They found that when the true value of the tested parameter was very close to the hypothesized value, coverage could be as small as 90%, but that as the difference between the true value and the hypothesized value increased, coverage increased toward 100%.

    Serlin and Zumbo also checked the performance of the Range Null Hypothesis Test (see their references to Hodges & Lehman, 1964, and Serlin & Lapsley, 1985) and found that coverage could be as small as 92.5%.

A different type of Type III error.

    Kimball (1957) wrote about "errors of the third kind in statistical consulting."  The error of which he spoke was giving the right answer to the wrong problem.  Kimball attributed this type of error to poor communication between the consultant and the client, and suggested that statistical consultants need be taught communication skills or "people involving" skills.  Raiffa (1968) very briefly described a Type III error as solving the wrong problem precisely.  This meaning of Type III error is clearly not in the domain of NHST  but one could argue that the entire enterprise of NHST is an example of a this type of Type III error.

Type IIIIIIIII error (also known as Type IX error).

    Also made by statistical consultants. The consultant has performed an analysis which adequately addresses the research question posed by the client. The client does not like the answer. The client suggests an inappropriate analysis that he thinks will give him the answer he wants. The consultant tells the client he is a &^$*   *#*$& for suggesting such an analysis.


Back to the Stat Help Page

Contact Information for the Webmaster,
Dr. Karl L. Wuensch

This page most recently revised on 26. February 2005.