Dealing with Missing Data:  Suggested Readings


From: Wuensch, Karl L
Sent: Thursday, September 09, 2004 11:18 AM
To: 'Zhou Huijun'
Subject: Missing values in multiple regression analysis

Dear Dr. Zhou:

        Dealing with missing data is a thorny problem.  If you exclude missing values listwise, then any case which is missing data on any of the variables in the model will be excluded from the analysis.  If only a few of your cases are missing data, then this is the best simple solution, IMHO.  If, however, many of your cases are missing data, then listwise deletion will greatly reduce your sample size may result in a biased sample as well.  With pairwise deletion the zero-order correlations among the modeled variables are computed with all available data and then the multiple regression analysis is conducted from the resulting correlation matrix.  While this has the advantage of using all of the available data, strange things can happen due to the fact that the various correlations are based on different sample sizes.   More modern solutions to missing data involve replacing missing score with one or more (multiple imputation) plausible values.  Basically this involves, for each case, replacing the missing data with those values which would be expected given that case's scores on the other variables.  This is an exceptionally labor/computing intensive solution, and I have avoided it because of that.  If you wish to pursue the more modern solutions to estimating of missing data values, I recommend the following readings:

-----Original Message-----
Dear Prof. Karl L. Wuensch:

I am a postgraduate student at National University of Singapore. I am very lucky to get your name from your personal website when I have trouble in analyzing my data. My data is about lead toxin to renal systems. I aim to find out one particular genetic polymophirsm contribute the different response to lead after adjusting  for other variables.  The main statistical method is multiple regression analysis. One problem confused me in the recent days. Some variables have missing values.   During the regression analysis, under the "option" dialogue box, selecting "exclude missing values list wise" or "exclude missing values pairwise" caused a big difference in terms of significance level of some variables sometimes. I am wondering which result is more accurate.

Thanks a lot,

Best regards,
Dr. Huijun Zhou
 

Back to the Stat Help Page

Visit Karl's Index Page


Contact Information for the Webmaster,
Dr. Karl L. Wuensch



This page most recently revised on 9. September 2004.