Simpson.txt ======================================================================== Date: Thu, 16 Jan 1997 11:00:06 -0700 Sender: "Statistics and statistical discussion list: STAT-L" From: "John R. Vokey" Organization: University of Lethbridge Subject: Simpson's paradox Simpson's paradox refers to the reversal of relationship following the collapsing over heterogeneity of multi-way contingency tables. More recently, it has come to refer to ANY change of relationship (i.e., magnitude and sign) following such collapsing. It can be seen to be a special case of the more general problem of inappropriate cross-level inference, of which the "ecological fallacy" and the "individualistic fallacy" are also special cases. It is also a very old problem, recognised at least as early as Yule (1903). An intriguing recent example is that provided by Wardrop (1995) in which he argues that the much believed, but fallacious "hot-hand" in basketball arises from just such collapsing. That is, collapsed over heterogeneous shooters, there is an apparent "hot-hand", but only at that aggregated level of analysis; with shooters as the unit of analysis, no such relationship exists (as Gilovich and Tversky have long maintained). Anil Menon (Syracuse University, School of Engg. and Computer Science) compiled this reference list and placed it on a "Simpson's Paradox" web-site he had created. Unfortunately, the web-site URL has changed (or no longer exists): Articles on Simpson's Paradox and Related topics Last updated: 19/03/96 I got interested in Simpson's paradox while studying deception in Genetic Algorithms. Here is a list of articles that might be useful. Fortunately, John Vokey at the Department of Psychology, University of Lethbridge, was kind enough to post most of these references, saving me an ascii adventure. I have grouped the bibliography in several ways: * A Beginner's Guide, * Chronologically, * Alphabetically. Eventually, I may put up a topically organized list as well... Please inform me if I have let out any pertinent articles. Some related links are: * Simple example based on drug tests. * A news group discussion (may have been removed). * Graphical Methods for Categorical Data. _________________________________________________________________ A Beginner's Guide I would recommend that the newcomer start off with: * Authors : Blyth, C. R. Title : On Simpson's paradox and the sure thing principle. Source : Journal of the American Statistical Association, 67, 1972, 364-381. For some real-life examples of Simpson's paradox, see Keyfitz's classic book. * Authors : Keyfitz, N. Booktitle : Applied mathematical demography, Wiley, New York, pp. 385-391, 1977. My favorite analysis of Simpson's paradox is the one in Simon and Blume's excellent book: * Authors : Simon, C. P. and Blume, L. Booktitle : Mathematics for Economists, W. W. Norton and Company, New York, pp. 368-371, pp. 784-791, 1994. They explain it using Don Saari's results. The importance of his work in the study of ``social paradoxes'' cannot be over-emphasized. A good starting point to Saari's remarkable theorem is: * Authors : Saari, D. G. Title : The source of some paradoxes from social choice and probability. Source : Journal of Economic Theory, 41(1), 1-22, 1987 Shyam Sunder's paper gives Yuji Ijiri's necessary and sufficient condition for Simpson's paradox to occur in the ``simplest possible case''. This condition is a special case of Saari's theorem, but is particularly clear and simple to use in practice. I had no idea accountants worried about such matters. * Authors : Sunder, S. Title : Simpson's reversal paradox and cost allocation. Source : Journal of Accounting Research, 21, 222-233, 1983. Finally, I urge the reader to take a look at Vaupel and Yashin's very readable paper on the pernicious effects of heterogeneity on statistical decision making. It reads like a Stephen King novel (and is also equally horrifying). * Authors : Vaupel, J. W. and Yashin, A. I. Title : Heterogeneity's ruses: some surprising effects of selection on population dynamics. Source : The American Statistician, 39(3), 176-185, 1985. _________________________________________________________________ Chronological Bibliography The 1900's Authors : Yule, G. U. Title : Notes on the theory of association of attributes in statistics. Source : Biometrica, 2, 121-134, 1903. The 1930's Authors : Thorndike, E. L. Title : On the fallacy of imputing the correlations found for groups to individuals or smaller groups composing them. Source : American Journal of Psychology, 52, 122-124, 1939. The 1940's Authors : Deming, M. E. and Stephan, F. F. Title : On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Source : Annals of Mathematical Statistics, 11, 1940, 427-444. Authors : Lindquist, E. F. Title : Statistical analysis in educational research. Source : Boston: Houghton Mifflin, 1940. Authors : Deming, W. E. Title : Statistical adjustment of data. Source : New York: Dover Publications, Inc., 1943. The 1950's Authors : Robinson, W. S. Title : Ecological correlations and the behavior of individuals. Source : American Sociological Review, 15, 351-357, 1950. Authors : Simpson, E. H. Title : The interpretation of interaction in contingency tables. Source : The American Statistician, 13, 238-241, 1951. The 1960's Authors : Mosteller, F. Title : Association and estimation in contingency tables. Source : Journal of the American Statistical Association, 63, 1-28, 1968. The 1970's Authors : Goodman, L. A. Title : The multivariate analysis of qualitative data: interactions among multiple classifications. Source : Journal of the American Statistical Association, 65, 226-256, 1970. Authors : Blyth, C. R. Title : On Simpson's paradox and the sure thing principle. Source : Journal of the American Statistical Association, 67, 1972, 364-381. Authors : Bickel, P. J., Hammel, E. A., and O'Connell, J. W. Title : Sex bias in graduate admissions: Data from Berkeley. Source : Science, 187, 1975, 398-404. Authors : Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. Title : Discrete multivariate analysis: Theory and practice. Source : Cambridge, Massachusetts: The MIT Press, 1975. Authors : Gardner, M. Title : On the fabric of inductive logic and some probability paradoxes. Source : Scientific American, 234, 119- 124, 1976. Authors : Fienberg, S. E. Title : The analysis of cross-classified categorical data. Source : Cambridge, Massachusetts: The MIT Press, 1977. Authors : Keyfitz, N. Booktitle : Applied mathematical demography, Wiley, New York, pp. 385-391, 1977. Authors : Knapp, T. R. Title : The unit-of-analysis problem in applications of simple correlation analysis to educational research. Source : Journal of Educational Statistics, 2, 171-186, 1977. Authors : Freedman, D., Pisani, R., and Purves, R. Title : Statistics. Source : W.W. Norton & Company, New York, 1978. Authors : Whittemore, A. S. Title : Collapsibility of multi- dimensional contingency tables. Source : Journal of the Royal Statistical Society, Ser. B., 40, 328-340, 1978. The 1980's Authors : Hintzman, D. L. Title : Simpson's paradox and the analysis of memory retrieval. Source : Psychological Review, 87, 398-410, 1980. Authors : Flexser, A. J. Title : Homogenizing the 2 X 2 contingency table: A method for removing dependencies due to subject and item differences. Source : Psychological Review, 88, 327-339, 1981. Authors : Martin, E. Title : Simpson's paradox resolved: A reply to Hintzman. Source : Psychological Review, 88, 372-374, 1981. Authors : Mantell, N. Title : Simpson's paradox in reverse. Source : The American Statistician, 36, 395, 1982. Authors : Saari, D. G. Title : Inconsistencies of weighted summation voting systems. Source : Mathematics of Operations Research, 7(4), 479-490, 1982. Authors : Shapiro, S. H. Title : Collapsing contingency tables -- a geometric approach. Source : The American Statistician, 36, 43-46, 1982. Authors : Wagner, C. H. Title : Simpson's paradox in real life. Source : The American Statistician, 36, 46-48, 1982. Authors : Kennedy, J. J. (1983) Title : Analyzing qualitative data. Introductory log-linear analysis for behavioral research. Source : New York: Praeger Publishers, 1983. Authors : Sunder, S. Title : Simpson's reversal paradox and cost allocation. Source : Journal of Accounting Research, 21, 222-233, 1983. Authors : Knapp, T. R. Title : Instances of Simpson's paradox. Source : College Mathematics Journal, 16, 209-211, 1985. Authors : Paik, M. Title : A graphic representation of a three-way contingency table: Simpson's paradox and correlation. Source : The American Statistician, 39, 53-54, 1985. Authors : Vaupel, J. W. and Yashin, A. I. Title : The deviant dynamics of death in heterogeneous populations. Source : Sociological Methodology, Tuma, N. B. (ed), pp. 179-211, 1985. Authors : Vaupel, J. W. and Yashin, A. I. Title : Heterogeneity's ruses: some surprising effects of selection on population dynamics. Source : The American Statistician, 39(3), 176-185, 1985. Authors : Cohen, J. E. Title : An uncertainty principle in demography and the unisex issue. Source : The American Statistician, 41, 1986, 32-39. Authors : Saari, D. G. Title : The source of some paradoxes from social choice and probability. Source : Journal of Economic Theory, 41(1), 1-22, 1987 Authors : Saari, D. G. Title : Symmetry, Voting and Social Choice Source : The Mathematical Intelligencer, 10(3), 32-42, 1988. Authors : Kaigh, W. D. Title : A category representation paradox. Source : The American Statistician, 43(2), 92-97, 1989. Authors : Wermuth, N. Title : Moderating effects of subgroups in linear models. Source : Biometrika, 76, 81-92, 1989. The 1990's Authors : Freehling, J. S. Title : Simpson's paradox and database profiling. Source : Direct Marketing, 53(5), 26-27, 1990. Authors : Haunsperger, D. B. and Saari, D. G. Title : The lack of consistency for statistical decision procedures. Source : The American Statistician, 45(3), 252-255, 1991. Authors : Klay, M. P. and Wesley, L. P. Title : Simpson's paradox: a maximum likelihood solution. Source : SRI International Technical Report, No. 502, 1-11, 1991. Authors : Mittal, Y. Title : Homogeneity of subpopulations and Simpson's Paradox. Source : Journal of the American Statistical Association, 86(413), 167-172, 1991. Authors : Abramson N. S., Kelsey S. F., Safar P., and Sutton-Tyrrell K. Title : Simpson's paradox and clinical trials: What you find is not necessarily what you prove. Source : Annals of Emergency Medicine 21, pp. 1480-1482, 1992. Authors : DeBlois, B. M. Title : Simpson's Paradox. Source : Mathematica Militaris, 3(1), 1992. Authors : Mehrez, A., Brown, J. R., and Khouja, M. Title : Aggregate efficiency measures and Simpson's paradox. Source : Contemporary Accounting Research, 9(1), 329-342, 1992. Authors : Rogers, A. Title : Heterogeneity and selection in multistate population analysis. Source : Demography, 29(1), 31-38, 1992. Authors : Gunter, B. Title : A trio of statistical double takes. Source : Quality Progress, 26(6), 84-86, 1993. Authors : Simon, C. P. and Blume, L. Booktitle : Mathematics for Economists, W. W. Norton and Company, New York, pp. 368-371, pp. 784-791, 1994. Authors : Wardrop, R. L. Title : Simpson's Paradox and the Hot Hand in Basketball. Source : The American Statistician, 49, 24-28, 1995. _________________________________________________________________ Alphabetical Bibliography Authors : Abramson N. S., Kelsey S. F., Safar P., and Sutton-Tyrrell K. Title : Simpson's paradox and clinical trials: What you find is not necessarily what you prove. Source : Annals of Emergency Medicine 21, pp. 1480-1482, 1992. Authors : Bickel, P. J., Hammel, E. A., and O'Connell, J. W. Title : Sex bias in graduate admissions: Data from Berkeley. Source : Science, 187, 1975, 398-404. Authors : Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. Title : Discrete multivariate analysis: Theory and practice. Source : Cambridge, Massachusetts: The MIT Press, 1975. Authors : Blyth, C. R. Title : On Simpson's paradox and the sure thing principle. Source : Journal of the American Statistical Association, 67, 1972, 364-381. Authors : DeBlois, B. M. Title : Simpson's Paradox. Source : Mathematica Militaris, 3(1), 1992. Authors : Cohen, J. E. Title : An uncertainty principle in demography and the unisex issue. Source : The American Statistician, 41, 1986, 32-39. Authors : Deming, W. E. Title : Statistical adjustment of data. Source : New York: Dover Publications, Inc., 1943. Authors : Deming, M. E. and Stephan, F. F. Title : On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Source : Annals of Mathematical Statistics, 11, 1940, 427-444. Authors : Fienberg, S. E. Title : The analysis of cross-classified categorical data. Source : Cambridge, Massachusetts: The MIT Press, 1977. Authors : Flexser, A. J. Title : Homogenizing the 2 X 2 contingency table: A method for removing dependencies due to subject and item differences. Source : Psychological Review, 88, 327-339, 1981. Authors : Freedman, D., Pisani, R., and Purves, R. Title : Statistics. Source : W.W. Norton & Company, New York, 1978. Authors : Freehling, J. S. Title : Simpson's paradox and database profiling. Source : Direct Marketing, 53(5), 26-27, 1990. Authors : Gardner, M. Title : On the fabric of inductive logic and some probability paradoxes. Source : Scientific American, 234, 119- 124, 1976. Authors : Gunter, B. Title : A trio of statistical double takes. Source : Quality Progress, 26(6), 84-86, 1993. Authors : Goodman, L. A. Title : The multivariate analysis of qualitative data: interactions among multiple classifications. Source : Journal of the American Statistical Association, 65, 226-256, 1970. Authors : Haunsperger, D. B. and Saari, D. G. Title : The lack of consistency for statistical decision procedures. Source : The American Statistician, 45(3), 252-255, 1991. Authors : Hintzman, D. L. Title : Simpson's paradox and the analysis of memory retrieval. Source : Psychological Review, 87, 398-410, 1980. Authors : Kaigh, W. D. Title : A category representation paradox. Source : The American Statistician, 43(2), 92-97, 1989. Authors : Kennedy, J. J. (1983) Title : Analyzing qualitative data. Introductory log-linear analysis for behavioral research. Source : New York: Praeger Publishers, 1983. Authors : Keyfitz, N. Booktitle : Applied mathematical demography, Wiley, New York, pp. 385-391, 1977. Authors : Klay, M. P. and Wesley, L. P. Title : Simpson's paradox: a maximum likelihood solution. Source : SRI International Technical Report, No. 502, 1-11, 1991. Authors : Knapp, T. R. Title : The unit-of-analysis problem in applications of simple correlation analysis to educational research. Source : Journal of Educational Statistics, 2, 171-186, 1977. Authors : Knapp, T. R. Title : Instances of Simpson's paradox. Source : College Mathematics Journal, 16, 209-211, 1985. Authors : Lindquist, E. F. Title : Statistical analysis in educational research. Source : Boston: Houghton Mifflin, 1940. Authors : Mantell, N. Title : Simpson's paradox in reverse. Source : The American Statistician, 36, 395, 1982. Authors : Martin, E. Title : Simpson's paradox resolved: A reply to Hintzman. Source : Psychological Review, 88, 372-374, 1981. Authors : Mehrez, A., Brown, J. R., and Khouja, M. Title : Aggregate efficiency measures and Simpson's paradox. Source : Contemporary Accounting Research, 9(1), 329-342, 1992. Authors : Mittal, Y. Title : Homogeneity of subpopulations and Simpson's Paradox. Source : Journal of the American Statistical Association, 86(413), 167-172, 1991. Authors : Mosteller, F. Title : Association and estimation in contingency tables. Source : Journal of the American Statistical Association, 63, 1-28, 1968. Authors : Paik, M. Title : A graphic representation of a three-way contingency table: Simpson's paradox and correlation. Source : The American Statistician, 39, 53-54, 1985. Authors : Rogers, A. Title : Heterogeneity and selection in multistate population analysis. Source : Demography, 29(1), 31-38, 1992. Authors : Robinson, W. S. Title : Ecological correlations and the behavior of individuals. Source : American Sociological Review, 15, 351-357, 1950. Authors : Saari, D. G. Title : Inconsistencies of weighted summation voting systems. Source : Mathematics of Operations Research, 7(4), 479-490, 1982. Authors : Saari, D. G. Title : The source of some paradoxes from social choice and probability. Source : Journal of Economic Theory, 41(1), 1-22, 1987 Authors : Saari, D. G. Title : Symmetry, Voting and Social Choice Source : The Mathematical Intelligencer, 10(3), 32-42, 1988. Authors : Shapiro, S. H. Title : Collapsing contingency tables -- a geometric approach. Source : The American Statistician, 36, 43-46, 1982. Authors : Simon, C. P. and Blume, L. Booktitle : Mathematics for Economists, W. W. Norton and Company, New York, pp. 368-371, pp. 784-791, 1994. Authors : Simpson, E. H. Title : The interpretation of interaction in contingency tables. Source : The American Statistician, 13, 238-241, 1951. Authors : Sunder, S. Title : Simpson's reversal paradox and cost allocation. Source : Journal of Accounting Research, 21, 222-233, 1983. Authors : Thorndike, E. L. Title : On the fallacy of imputing the correlations found for groups to individuals or smaller groups composing them. Source : American Journal of Psychology, 52, 122-124, 1939. Authors : Vaupel, J. W. and Yashin, A. I. Title : Heterogeneity's ruses: some surprising effects of selection on population dynamics. Source : The American Statistician, 39(3), 176-185, 1985. Authors : Vaupel, J. W. and Yashin, A. I. Title : The deviant dynamics of death in heterogeneous populations. Source : Sociological Methodology, Tuma, N. B. (ed), pp. 179-211, 1985. Authors : Wagner, C. H. Title : Simpson's paradox in real life. Source : The American Statistician, 36, 46-48, 1982. Authors : Wardrop, R. L. Title : Simpson's Paradox and the Hot Hand in Basketball. Source : The American Statistician, 49, 24-28, 1995. Authors : Wermuth, N. Title : Moderating effects of subgroups in linear models. Source : Biometrika, 76, 81-92, 1989. Authors : Whittemore, A. S. Title : Collapsibility of multi- dimensional contingency tables. Source : Journal of the Royal Statistical Society, Ser. B., 40, 328-340, 1978. Authors : Yule, G. U. Title : Notes on the theory of association of attributes in statistics. Source : Biometrica, 2, 121-134, 1903. -- Dr. John R. Vokey, Associate Professor, Department of Psychology University of Lethbridge, Lethbridge, Alberta, CANADA T1K 3M4 mailto:vokey@hg.uleth.ca http://www.uleth.ca/~vokey ======================================================================== 39 Date: Fri, 17 Jan 1997 11:29:27 -0600 Sender: "Statistics and statistical discussion list: STAT-L" From: Clay Helberg Organization: SPSS, Inc. Subject: Re: Simpson's paradox John R. Vokey wrote: > Anil Menon (Syracuse University, School of Engg. and Computer Science) > compiled this reference list and placed it on a "Simpson's Paradox" web-site > he had created. Unfortunately, the web-site URL has changed (or no longer > exists): > Wow, what a great resource! Thanks for sharing it. You might also want to add the paper from the latest issue of American Statistician to the list: Appleton, Frnech, & Vanderpump (1996). Ignoring a covariate: an example of Simpson's paradox. American Statistician, 50(4), 340-341. -- Clay Helberg | Internet: helberg@execpc.com Publications Dept. | WWW: http://www.execpc.com/~helberg/ SPSS, Inc. | Speaking only for myself.... ======================================================================== Date: Tue, 12 May 1998 17:23:45 -0400 (EDT) Sender: owner-edstat-l@eos.ncsu.edu From: Michael Larsen To: zinaida@pegasus.rutgers.edu Cc: edstat-l@jse.stat.ncsu.edu Subject: Re: Simpson's paradox I posted this to the apstat list recently: E.H. Simpson, The Interpretation of Interaction in Contingency Tables, Journal of the Royal Statistical Society, Series B, 1951, pages 238-241. An example from Simpson (1951) (paraphrased, that is, basically copied): An investigator wants to whether in a standard deck of cards there is an association between being a court card (King, Queen, Jack) and color (Red, Black). The deck chosen by the examiner was played with by a baby and some cards were dirty. Thinking dirty versus not dirty potentially important, the investigator tabulates the 3-way table: Dirty Clean Court Plain Court Plain Red 4/52 8/52 (12/52) 2/52 12/52 (14/52) Black 3/52 5/52 (8/52) 3/52 15/52 (18/52) (7/52) (13/52) ((20/52)) (5/52) (27/52) ((32/52)) The Baby prefers red to black and court to plain cards, so the investigator could conclude there are positive associations between red and plain in the dirty and clean groups, but marginally Court Plain Red 6/52 20/52 Black 6/52 20/52 there is no association. ======================================================================== Date: Sun, 11 Oct 1998 06:18:42 +1000 Sender: owner-edstat-l@eos.ncsu.edu From: Rex Boggs To: Bob Beaver Cc: edstat-l@jse.stat.ncsu.edu Subject: Re: Simpson's Paradox Bob Beaver wrote: > I know this topic has been often discussed, but please bear with me. Is there > a web site, or can someone recommend a textbook in which this interesting > paradox is clearly explained. Thanks in advance. There is an explanation, with a number of examples at my Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/sim_par.htm There are also a couple of links to other sites. Cheers Rex -- Rex Boggs Phone: 0749 282 488 Glenmore SHS Fax: 0749 261 390 P.O. Box 5822, R.M.C. Email: rex@rocknet.net.au Rockhampton QLD 4702 Australia ------------------------------------------------------ Secondary Mathematics Assessment and Resource Database http://smard.cqu.edu.au ------------------------------------------------------ Exploring Data website http://curriculum.qed.qld.gov.au/kla/eda/ ------------------------------------------------------ ======================================================================== Date: Sat, 10 Oct 1998 21:26:43 GMT Sender: owner-edstat-l@eos.ncsu.edu From: clavius@uswest.net (Clay Helberg) To: edstat-l@jse.stat.ncsu.edu Subject: Re: Simpson's Paradox On 10 Oct 1998 18:10:52 GMT, bobb@nospam.com (Bob Beaver) wrote: >I know this topic has been often discussed, but please bear with me. Is there >a web site, or can someone recommend a textbook in which this interesting >paradox is clearly explained. Thanks in advance. I don't know of any textbook explanations off hand, but two good and very accessible articles were published on the subject recently. The first is by Appleton et al. in American Statistician (1996), vol 50 no 4, pp. 340-341. They discuss how smoking seems to have a protective effect in women overall, i.e. nonsmokers were more likely to have died during the 20-year interval of the study than smokers were. However, if you take age at first contact into account, then smoking has a detrimental effect, i.e. within each limited age group, smokers were more likely to have died than non-smokers. The reason for the discrepancy was the fact that when subjects were first interviewed, younger women were much more likely to be smokers than older women (and thus were more likely to live longer despite their smoking). The second paper is by Westbrooke in Chance (1998), vol 11 no 2, pp. 40-42. This example talks about jury selection in New Zealand, where the proportion of potential Maori jurors exceeded their representation in the population overall, but examining the districts individually reveals that in every case the Maori representation in the jury pool was smaller than that in the relevant population. Again, the cause of the discrepancy was the fact that proportions of Maori in the population varied from district to district, and also districts contributed in different proportions to jury pools. HTH. --Clay ======================================================================== Date: Sun, 11 Oct 1998 14:00:19 GMT Sender: owner-edstat-l@eos.ncsu.edu From: dave@autobox.com To: edstat-l@jse.stat.ncsu.edu Subject: Re: Simpson's Paradox In article <6vo7vc$r8k@bgtnsc02.worldnet.att.net>, bobb@nospam.com (Bob Beaver) wrote: > I know this topic has been often discussed, but please bear with me. Is there > a web site, or can someone recommend a textbook in which this interesting > paradox is clearly explained. Thanks in advance. > Please see http://www.autobox.com and in particular for examples of Simpson's Paradox in time series applications see http://www.autobox.com/blp30a.html dave r. -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own ****************************************************************** (####) (#######) (#########) (#########) (#########) (#########) (#########) (#########) (########) _____ (#########)> / \ (#########) º\/\/\/º \/\/ º (#########) º º º (o)(o) (o)(o)(##) º º C .---_) ,_C (##) º (o)(o) º º.___º /____, (##) C _) º \__/ \ (#) º ,___º /_____\ º º º / /_____/ \ OOOOOO /____\ / \ / \ / \ /\ /\ /\ /\ º V \/ \---. .----/ \----. \_ / \ / (o)(o) <__. .--\ (o)(o) /__. _C / \ () / /____, ) \ > (C_) < \ /----' /___\____/___\ ooooo /º º\ / \ / \