Dealing with Missing Data: Suggested Readings
From: Wuensch, Karl L
Sent: Thursday, September 09, 2004
11:18 AM
To: 'Zhou Huijun'
Subject: Missing values in
multiple regression analysis
Dear Dr.
Zhou:
Dealing with missing
data is a thorny problem. If you exclude missing values
listwise, then any case which is missing data on any of the variables
in the model will be excluded from the analysis. If only a few of your
cases are missing data, then this is the best simple solution, IMHO. If,
however, many of your cases are missing data, then listwise deletion will
greatly reduce your sample size may result in a biased sample as well.
With pairwise deletion the zero-order correlations among the
modeled variables are computed with all available data and then the multiple
regression analysis is conducted from the resulting correlation matrix.
While this has the advantage of using all of the available data, strange things
can happen due to the fact that the various correlations are based on different
sample sizes. More modern solutions to missing data
involve replacing missing score with one or more (multiple imputation) plausible
values. Basically this involves, for each case, replacing the missing data
with those values which would be expected given that case's scores on the other
variables. This is an exceptionally labor/computing intensive solution,
and I have avoided it because of that. If you wish to pursue the more
modern solutions to estimating of missing data values, I recommend the following
readings:
-----Original Message-----
Dear Prof. Karl L. Wuensch:
I am a
postgraduate student at National University of Singapore. I am very lucky to get
your name from your personal website when I have trouble in analyzing my data.
My data is about lead toxin to renal systems. I aim to find out one particular
genetic polymophirsm contribute the different response to lead after
adjusting for other variables. The main statistical method is
multiple regression analysis. One problem confused me in the recent days. Some
variables have missing values. During the regression analysis, under
the "option" dialogue box, selecting "exclude missing values list wise" or
"exclude missing values pairwise" caused a big difference in terms of
significance level of some variables sometimes. I am wondering which result is
more accurate.
Thanks a lot,
Best regards,
Dr. Huijun
Zhou
