Student's t Distribution and The Central Limit Theorem

Date: Mon, 11 Oct 93 11:39:01 EDT
From: "Karl L. Wuensch" <PSWUENSC@ECUVM1>
Subject: robustness of t
To: "David C. Howell" <D_Howell@uvmvax.uvm.edu>

Dave,

In your 92 text, p. 166, you said, "In practice, however, our t statistic can be referred to the t distribution whenever the sample size is sufficiently large to produce a normal sampling distribution of the mean." I have been uncomfortable with that statement ever since I started having my students do some Monte Carlos with distributions of sample means, z's, variances, and t's. I could forward the Minitab macro if you would like to see the exercise I have my students do. I have appended some recent email that resulted when I mentioned this Monte Carlo on one of the stat lists.

Date: Mon, 11 Oct 1993 17:37:15 +1000
From: barnett@agsm.unsw.edu.au
Newsgroups: sci.stat.consult
Organization: The Australian Graduate School of Management

Karl,
In article <STAT-L%93100719504233@VM1.MCGILL.CA> you write: The Monte Carlos I have my students run to demonstrate this indicate that when sampling from a distinctly non-normal population with N large enough for the distribution of sample means to be approximately normal the distribution of t's computed on such samples may still be distinctly different from the Student's t distribution.

I have had trouble convincing lecturers in statistics of this fact (I knew it when I was an undergrad). Even as a lecturer I had to jump up and down a lot to get some to think about it. The problem, of course, is that the t is a ratio of *two* random variables - the CLT applies to the numerator, but the denominator is different. If the data are normal, the estimate of variance in the denominator is a constant times a chi-squared random variable, and the t-distribution falls out neatly. But when the data are non-normal, the chi-squared goes away.

Glen

Date: Mon, 11 Oct 93 10:18:58 EDT
From: "Karl L. Wuensch" <PSWUENSC@ECUVM1>
To: barnett@agsm.unsw.edu.au

Thank you for your rational explanation of the nonrobustness of t even when the sample means are nearly normally distributed. I shall post it to the helps disk my students read. I have my students run a little Monte Carlo in Minitab -- it samples data from a very skewed distribution (chi-sq on one df), shows the dist. of sample means to be distinctly non-normal on small N, shows that the sample means approach normality with larger N, but also shows that the distribution of t computed on those means looks nothing like Student's t. While I generally restrain myself from jumping up and down at this point, I do find it exciting when I ask them to look at the histograms from their Monte Carlo and tell me how much that distribution of computed t's looks like Student's t's, especially since they will have just before that looked at distributions of t's computed on samples drawn from normal populations (it also excites me to demonstrate to them the great kurtosis of the t's when N is small and the approach to normality as N increases).

Date: Mon, 11 Sep 1995 09:57:20 -0500
To: "Karl L. Wuensch" <PSWUENSC@ecuvm.cis.ecu.edu>
From: dhowell@moose.uvm.edu (David C. Howell)
Subject: Robustness of t

Karl,

A long time ago you sent me a message concerning a statement that I made on p. 166 of the 92 text. I said "In practice, however, our t statistic can be referred to the t distribution whenever the sample size is sufficiently large to produce a normal sampling distribution of the mean." You also sent me a message from barnett@agsm.unsw.edu.au that helped to clarify a lot.

I am in the process of revising that book and am going back through lots of stuff. That is how I came across your message again. I have made some adjustments to the text along the lines of your concerns, though me experience has been that for at least moderate departures from normality we are safe in using t.

You said in your message that you had a macro for minitab that your students use to illustrate the problem when sampling from distinctly non-normal distributions. I have a vague memory that you even sent me that macro, though a search of my hard disk does not bring it to light. If you still have the macro available, I'd like to see it (again). I want to play around and see just how far I can screw things up and get away with it.

Thanks,

Dave

__________________________________________________________________
David C. Howell David.Howell@uvm.edu
Dept of Psychology http://www.uvm.edu/~dhowell/
Univ. of Vermont Phone: (802) 656-2670
Burlington, VT 05405 FAX: (802) 656-8783

Date: Mon, 11 Sep 95 22:43:42 EDT
From: "Karl L. Wuensch" <PSWUENSC@ECUVM1>
Subject: Re: Robustness of t
To: "David C. Howell" <dhowell@moose.uvm.edu>

Hi David,

I just sent that Minitab macro your way. Should be pretty self explanatory -- pay special attention to the shapes of the "t" distributions which are produced when sampling from a very skewed population (a chi-square on one df). One warning -- I don't know how good the Minitab random number generators are, but what I got from Minitab on our platform (VM, CMS) indicated the "t's" were not Student's, markedly different. Of course, it is always possible that I made some really stupid mistake when writing the macro -- if so, none of my students has found it yet.

I look forward to hearing your thoughts on this. Ciao for now.......

PS -- I have now replaced the Minitab macro with a SAS program, MonteCarlo on my SAS Programs Page.