Constructing Sampling Distributions With SAS

One way to estimate the probabilities of the various possible outcomes of some process is to use some device that simulates the process.  The device is used to simulate outcomes on many trials, and the frequency of each outcome is noted.  Such a technique is called a Monte Carlo, probably because the earliest processes of great interest to students of probability were processes that took place within gambling casinos.

We shall use the computer to construct sampling distributions of the mean (and related statistics), standard deviation, and variance, changing various parameters and noting the effect of such changes.

Run the program MonteCarlo.sas, which is available on my SAS Programs page.  I do not expect you to understand the details of this program, but would like to give you a sketch of it.  In the data step NORMAL1, I randomly sample 2,500,000 scores from a normal population which has a mean of 100 and a standard deviation of 15.  I arrange these in a matrix with 25 columns and 100,000 rows.  Then I use PROC UNIVARIATE to look at the distribution of one sample of 100,000 of these scores.  In the NORMAL2 step, I compute sample mean, standard deviation, variance, z, and t on 100,000 samples with N = 9 and 100,000 samples with N = 25.  The z and the t test the null hypothesis that the population mean is 100.  This may be your one chance in life to see an absolutely true null hypothesis be tested (assuming that SAS' random number generator is flawless).  These statistics are output to data set NORMAL3.  Keep in mind that these statistics are being computed on sampling distributions (the 'scores' in each of these distributions are statistics computed on samples randomly drawn from the same population).  Various statistics from the sampling distributions are displayed, and TITLE statements are used to label the output.  In the remainder of the program I do essentially the same thing again using an exponential distribution.

After you run the program, use the output to answer the 20 questions below.  Post, in the Monte Carlo Forum in BlackBoard,  your answers to the question(s) for which you are responsible.  Be sure to read the posts of the other students too.

For the Sample and Sampling Distributions From the Normal Population

1.  Look at the PROC UNIVARIATE output from the sample of 100,000 scores from the normal population.  Does it appear that this sample came from a normally distributed population?  Refer to the skewness, kurtosis, and the histogram (which may be split between pages if you printed your output, but you can view it intact on your computer screen) in your answer.

2.  Look at the Kolmogorov-Smirnov statistic.  This statistic tests the null hypothesis that the sample was randomly drawn from a normally distributed population.  With 100,000 scores in this sample, this statistic should have enormous power -- that is, if the population is not normal, we are almost certain to get a significant result here.  Report the p value obtained from the Kolmogorov-Smirnov for your sample and interpret that test.

3. The program obtains the actual frequency of Type I errors when the nominal alpha is .05.  What percentage of the samples resulted in Type I errors when N = 9?  When N = 25?

4. The sample mean is an unbiased estimator of the population mean.  The mean of your 100,000 sample means from the normal distribution should then be approximately 100 (“Approximately” because we simulated only 100,000 samples, not an uncountably large number of samples.)  What mean did you obtain when N = 9?  When N = 25?

5. Given that you know that the population standard deviation is 15, what should the standard deviation of the sample means (the standard error of the mean) be when N = 9?  When N = 25?

6. What standard deviations did you actually obtain for these two sampling distributions?  Explain how you have demonstrated that the sample mean is a consistent estimator.

7. Report the skewness and kurtosis of these two distributions of sample means (that with N = 9 and that with N = 25) and describe the shape of the distributions.

8. Look at the statistics on the distribution of sample variances, N = 9.  Since you know that the sample variance is an unbiased estimator, you expect a variance of 152 = 225.  What did you get?

9. Is this distribution skewed, and if so, how much and in what direction?

10. Compare the skewness of the distribution of sample variances when N = 25 to that when N = 9.  What has happened to the distribution of sample variances as N increased?

11. Look at the skewness of the distributions of the standard deviationsCompare the skewness of the distributions of standard deviations with that observed for the distributions of the variances.  Keeping in mind that the standard deviation is just the square root of the variance, draw a conclusion regarding the effect of a square root transformation when applied to a distribution of scores which is positively skewed.

12. Look at the distributions of the z statisticsOn skewness and kurtosis, for N = 9,  compare the distribution of sample means with the distribution of z statistics.  Do the same for N = 25.  Keeping in mind that the z statistics are just a linear transformation of the means, draw a conclusion regarding the effect of linear transformations on the shape of a distribution.

13. Now look at the t-distributionsFor N = 9, how does the standard deviation and the kurtosis of the t-distribution differ from that of the z-distribution?  What characteristic of the distribution of sample variances causes t to differ from z in this way (explain your answer)?

14. How did the shape of the t‑distribution change when we went from N = 9 to N = 25?  Draw a conclusion with respect to how the shape of the t-distribution changes with increasing df.

For the Sample and Sampling Distributions From the Exponential Population

The exponential distribution is  distinctly non-normal, as you will soon discover.

1. Look at the PROC UNIVARIATE output from the sample of 100,000 scores from the exponential population.  Does it appear that this sample came from a normally distributed population?  Refer to the skewness, kurtosis, histogram, and Kolmogorov-Smirnov statistic in your answer.

2. The program obtains the actual frequency of Type I errors when the nominal alpha is .05.  What percentage of the samples resulted in Type I errors when N = 9?  When N = 25?

3. Look at the statistics from the distribution of sample means from an exponential population, N = 9.  How does this sampling distribution differ in shape from that you obtained with sample means drawn from a normally distributed population?

4. Look at the statistics from the distribution of sample means from an exponential distribution, N = 25How does its shape differ from that of the sample means from an exponential distribution, N = 9?  You have empirically verified what theorem discussed in our class and our textbook?

5. Look at the statistics from the distributions of sample standard deviations when the samples were drawn from an exponential distribution.  How does the skewness compare with that obtained when you sampled from a normal population?

6. Look at the t-distributions computed with our samples from an exponential distribution.  Are the skewness measures here what you would expect for Student's t?

Read Glen Barnett's explanation of why the CLT does not rescue us here.