East Carolina University
Department of Psychology
Resampling Statistics
A new addition to the chapter on nonparametric statistics, in David Howell's Methods text, 5th edition, is a brief introduction to resampling statistics. These statistics may be appropriate in circumstances where you are uncomfortable with the normality assumption common to parametric inferential statistics, and perhaps in some other circumstances as well. Dave is of the opinion that resampling statistics will replace the traditional nonparametric statistics, and perhaps the traditional parametric statistics, in time.
Howell’s resampling software is provided on the CD that comes with the text, but you should download the latest program from his web site. First, go to Howell’s Resampling Statistics page. In addition to reading the material there, you should download the program, unzip it, and install it on your personal computer.
Bootstrapping
With this approach, one constructs a sampling distribution by repeatedly sampling, with replacement, from the actual sample of data at hand. This is much like what we did back in PSYC 6430 when we employed Monte Carlo methods to construct sampling distributions of the mean, variance, standard deviation, z, and t, but then we sampled from mathematically defined populations (such as the standard normal population).
Confidence Interval for a Median
Consider the example presented on pages 638 and 639 of our text book (David Howell's
Statistical Methods for Psychology, 6th edition). The data at hand are 20 scores on a memory task. We wish to construct an 95% confidence interval for the median. We chose the median because the distribution appears to be distinctly skewed. Here is how we construct the confidence interval:
- We assume that the population is distributed exactly as is the sample.
- We randomly draw one score from the sample. We record it, replace it, and draw another. We repeat this process 20 times (20 because there are 20 scores in the sample). We compute the median of the obtained resample and record it.
- We repeat this process, obtaining a second sample of 20 scores and computing and recording a second median.
- We continue until we have obtained a large number (10,000 or more) of resample medians. We obtain the probability distribution of these medians and treat it like a sampling distribution.
- From the obtained sampling distribution, we find the .025 and the .975 percentiles. These define the confidence limits.
The sampling distribution obtained by Howell appears on page 641
of your textbook. The .025 percentile is between 5 and 6. We could interpolate
between 5 and 6 to obtain the lower confidence limit, but Howell plays it
conservative and sets the lower limit at 5. The .975 percentile is somewhere
between 9 and 10. Again, we could interpolate, but Howell plays it conservative
and sets the upper limit at 10. Since only 3 of the scores in the 10,000 samples
have values that fall outside of this confidence interval, our confidence
coefficient is actually 9997/10,000 = .9997.
Howell used the program Resampling Stats by Simon and Bruce to do the analysis shown in our textbook. We do not have that program, but Dr. Howell has provided us with resampling software. I would like you to give it a try. Please complete the following exercise and those that follow:
- Download the data file, which is at:
http://core.ecu.edu/psyc/wuenschk/StatData/MedianBoot.dat
- Run the program Resampling.exe.
- Click Help, Program Description. Read the little help file there.
- Click Analysis, Bootstrapping, Bootstrapped Median.
- Click File, Open, MedianBoot.dat, Open.
- Set NReps to 10,000, CI to 95%, click Run.
- Record the obtained median and the limits for the 95% confidence interval. Later you will share this information with the rest of the class. If you wish, you can capture the output screen by selecting it, holding down the Ctrl key, and hitting the Alt key. Then simply paste (Ctrl V) the screen into a Word document. You can find my output screens in Resampling Output.
Confidence Interval for Pearson r
Consider the data on misanthropy, idealism, and attitude about animals, which we analyzed back in PSYC 6430. We found a significant correlation between misanthropy and attitude about animals for nonidealists but not for idealists. Let us now put a confidence interval on the correlation we obtained with the nonidealists. Here is how we construct the confidence interval:
- We construct a bootstrap sample of 123 pairs of scores. That is, we sample, with replacement, from our actual sample, 123 pairs of scores. We compute Pearson r on that sample of 123 pairs of scores and then record the value of r.
- We repeat this process many times, each time recording the value of r.
- From the resulting distribution of values of r, we obtain the .025 and .975 percentiles. This is the confidence interval for our sample r
Let us use Howell's resampling program to construct this confidence interval:
- Download the data file, which is at:
http://core.ecu.edu/psyc/wuenschk/StatData/CorrReSample.dat
- Click Analysis, Bootstrapping, Bootstrapped Correlation.
- Click File, Open, CorrReSample.dat, Open.
- Set NReps to 10,000, CI to 95%, click Run.
- Record the obtained r and the limits for the 95% confidence interval. Later you will share this information with the rest of the class.
You know that the traditional independent samples t test is equivalent to a test of the null hypothesis that the point-biserial r is zero in the population. Accordingly, it might well make sense to use this correlation program for a bootstrapping test of the difference in means between two independent samples -- just code group membership with numbers (like 1 and 2) and run the program. If the confidence interval for the point biserial r does not include zero, then the two groups differ significantly.
Permutation/Randomization Tests
With this approach one takes the data at hand, randomly assigns scores to groups (without replacement), and then computes, on the obtained sample(s), the relevant statistic. This procedure is repeated many times, obtaining a sampling distribution of the statistic of interest.
Two Independent Samples
Consider the data on page 645 of our textbook. We have a sample of 49 scores in the success group, and 18 in the fail group. Here is how we conduct a permutation/randomization test:
- We take all 49 + 18 = 67 scores and throw them into one pot. After mixing them, we select one score and assign it to the success group. Without replacing it, we select a second score and assign it to the success group. We continue this until we have assigned 49 scores to the success group. The remaining 17 scores are assigned to the fail group
- We compute the medians of the resulting groups and then the difference between those medians. We record the difference in medians.
- We put the 67 scores back in the pot and repeat the process, obtaining and recording a second difference in medians for the resulting sample of 49 scores assigned to one group, 17 to the other group.
- We repeat this procedure a large number of times. The resulting set of differences in medians is our sampling distribution.
- From the sampling distribution, we determine what proportion of the differences had absolute values as large or larger than that observed in our actual data. If that proportion is less than some criterion (typically .05), we reject the null hypothesis of no difference in population medians.
Try using Howell's software to conduct this test:
- Download the data file, which is at:
http://core.ecu.edu/psyc/wuenschk/StatData/2IndReSamples.dat
- Click Analysis, Randomization Tests, Compare Medians of Two Samples.
- Click File, Open, 2IndReSamples.dat, Open.
- Set NReps to 10,000, CI to 95%, click Run.
- Record the obtained medians, their difference, and the limits for the 95% confidence interval for the difference under the null hypothesis of no difference. Later you will share this information with the rest of the class.
- What is reported here as a confidence interval is NOT what you usually think of as a confidence interval for the difference in medians, but is rather the nonrejection region for the test of our null hypothesis of no difference in medians. If the difference in medians for our two samples falls outside of that nonrejection interval, then we reject the null hypothesis of no difference in medians.
Two Correlated Samples
Consider the data on page 642 of our textbook. We have paired data from 19 subjects, and 19 signed difference scores. Here is how we conduct a permutation/randomization test:
- We take the 19 difference scores, strip them of their signs, and then randomly assign signs.
- We compute and record the median of the resulting set of signed difference scores.
- We repeat this process a large number of times
- From the resulting sampling distribution under the null hypothesis of no difference in medians, we compute our p value by finding what proportion of the resamples differ from 0 by at least as much as did our sample median difference score.
Our sample median difference score was 6. From the resampling distribution on page
643 of our textbook, we see that p = (10 + 13)/10,000 = .0023.
Try using Howell's software to conduct to analyze these data:
- Download the data file, which is at:
http://core.ecu.edu/psyc/wuenschk/StatData/2CorrReSamples.dat
- Click Analysis, Randomization Tests, Two Paired Samples.
- Click File, Open, 2CorrReSamples.dat, Open.
- Set NReps to 10,000, click Run.
- Record the obtained t and the p value. Later you will share this information with the rest of the class.
- Note that this test compares means, not medians. The program computes t for the original paired sample and for each of the resamples. The p which is reported is not based on Student's t, but rather on the location of the obtained t in the distribution of t values for the resamples.
Closing Comments
The program will do more than what I have covered here. Feel free to play around with the other routines available there.
Dr. Howell's program is designed as a teaching tool rather than a research tool. If you wish to conduct resampling statistics for research purposes, you might want to get a commercial package -- unless you are as frugal as am I.
Thanks, Dave, for your work on this!
Statistics 101
John Grosberg offers a giftware program he has written,
Statistics101. It executes the
Resampling Stats language of Julian Simon and Peter Bruce. I have not had
a chance to evaluate it myself.
Back to the Stat Help Page
Contact Information for the Webmaster,
Dr. Karl L. Wuensch
This page most recently revised on
7. February 2007.