Exact-P.txt Date: Sun, 24 Mar 1996 11:52:26 EST Sender: "Psychology department faculty, staff and student list" Comments: Resent-From: "Karl L. Wuensch" Comments: Originally-From: palij@xp.psych.NYU.EDU From: "Karl L. Wuensch" Subject: Re: APA Style/Probability Many of the readers of this list are studying or have recently studied stats with Linda or me. They should find the following post from TIPS (Teaching in Psychology) of interest. My reply will follow. ----------------------------Original message---------------------------- Lorri Cerro writes: >I'm about to deal with SPSS t tests in my stats/methodology class, >and I wanted to pose a question that bothered me from last >semester: > >I suggest my students report exact p values (since it's more >precise and allowed in 4th ed. APA style), but how do you report a >computer-generated p value of .000? It would not be correct to say >p = .000, right? Is the best option here p < .001? I have several comments but will respond to your question first. Yes, the best option you have, given your approach, is to report p< .001. Having said that I would like to say the following: (1) there are two ways of reporting p values: (a) identifying a general level of signifance for a test (e.g., p< .05) and then identifying whether a test is significant or not at this level, and (b) reporting the p-value associated with individual tests (as you appear to be doing). The problem with the latter approach is that the p-value is really not of much interest because it says little about the population parameters being tested. Moreover, across replications for a constant sample sizes, we would expect the p-values to vary greatly, from non-significance to some "highly" significant level. When replications are based on varying sample sizes, one can predict quite confidently that the p-values associated with small samples will be smaller than the p-values associated with large sample, all the while the population difference (that is what is being assessed in the t-test) is constant. Focusing on actual p-values is therefore of little value. (2) there is a common tendency to use p-values as a measure of effect size (i.e., a p< .00000001 is somehow more significant than a p> .05 even though they may be tests of the same population difference) instead of using more appropriate measures of effect size, such as Cohen's d in the two sample case. Focusing on the p-value, as mentioned above, provides little information about the situation in the population(s) which is what the test is all about. Better to report whether the was significant or not at the p> .05 level, report an effect size for the difference, and a confidence interval for the difference. Power and Meta-analysts in the future will thank you for making their lives so much easier. :-) -Mike Palij/Psychology Dept/New York University ======================================================================== Date: Sun, 24 Mar 96 11:56:53 EST From: "Karl L. Wuensch" Subject: Exact p-values To: ecupsy-l@ECUVM.CIS.ECU.EDU, TIPS@fre.fsu.umd.edu -Mike Palij advised us not to report exact p values because: >(2) there is a common tendency to use p-values as a >measure of effect size (i.e., a p< .00000001 is somehow >more significant than a p> .05 [sic -- I think Mike meant "p < .05" here] >even though they may be >tests of the same population difference) instead of using >more appropriate measures of effect size, such as Cohen's d >in the two sample case. Focusing on the p-value, as mentioned >above, provides little information about the situation in >the population(s) which is what the test is all about. > >Better to report whether the [ effect ] was significant or not at >the p> .05 level, report an effect size for the difference, >and a confidence interval for the difference. While I agree with most of what Mike has posted (including parts I did not quote), it strikes me that he is suggesting that we not report statistics that are commonly misinterpreted. By that criterion we would report next to nothing. ;-) The exact significance level is no more misleading than is the value of the test statistic (t, F, etc.), should we not report them exactly, just report whether or not they equal or exceed some critical value? I prefer to give the reader more, not less, information, but be sure the reader has all of the appropriate information (not just the exact p, but useful measures of effect size as well). A well written results section will caution the reader when a low p resulted from high power with a trivial effect size. The exact size of the p would seem to be especially important when it is close to that magical .05 level. Consider three p's from studies with equivalent power: p = .045, p = .055, and p = .55. Do we really want to report the first simply as "p < .05" and the latter two as "p > .05"? Certainly the difference between the two "nonsignificant" results is greater than the difference between the "significant" one and the "nearly significant" one. My recommendation is to provide an exact p (at least for p's which are > .001) and a measure of strength of effect. With simple comparisons, such as the t-tests mentioned in the original query, why not present confidence intervals for the difference as well (or instead of the "significance testing")? IMHO we might well be better off just dropping "significance testing," at least for the evaluation of effects which can be stated in terms of relatively simple estimators, that is, statistics about which we can easily place confidence intervals (estimates of means, differences between means, correlation coefficients, etc.). Karl L. Wuensch, Dept. of Psychology, East Carolina Univ. Greenville, NC 27858-4353, phone 919-328-4102, fax 919-328-6283 Bitnet Address: PSWUENSC@ECUVM1 Internet Address: PSWUENSC@ECUVM.CIS.ECU.EDU ======================================================================== 115 Date: Sun, 24 Mar 96 14:13:26 EST To: TIPS@fre.fsu.umd.edu Subject: Re: Exact p-values From: palij@xp.psych.nyu.edu "Karl L. Wuensch" writes: >-Mike Palij advised us not to report exact p values because: > [major snippage to save bandwidth] > While I agree with most of what Mike has posted (including parts >I did not quote), it strikes me that he is suggesting that we not >report statistics that are commonly misinterpreted. By that >criterion we would report next to nothing. ;-) You know, in some situations that would be an improvement. ;-) >The exact significance level is no more >misleading than is the value of the test statistic (t, F, etc.), >should we not report them exactly, just report whether or not they >equal or exceed some critical value? I prefer to give the reader >more, not less, information, but be sure the reader has all of the >appropriate information (not just the exact p, but useful measures >of effect size as well). I am in agreement. However, recall the original question that was asked: (paraphrasing) "What does one do when the computer output gives 'p< .0000'?" Reporting p< .0001 is not reporting an exact p-value in this case. Moreover, there are still some of us who actually still do some statistical tests with pocket calculators (in my exp psych lab we do this along with computer calculations in order to (a) keep the computer honest ;-) and (b) show that one can still do statistical analyses without a computer based statistical package). In the case of hand caluculations, one doesn't know the exact p-value, just whether the test is significant or not but one can still go on and calculate effect size, confidence intervals, and other statistics. >A well written results >section will caution the reader when a low p resulted from high >power with a trivial effect size. Ah, but the key phrase here is "well written". In too many places, including the journals of several different areas, the results section, the discussion, and the abstract are not well written. I still cringe when I read an abstract and see "the difference was very significant (p< .00001)" when the person should be presenting an effect size measure as well as a statement about the psychological or practical significance of the result. >The exact size of the p would >seem to be especially important when it is close to that magical .05 >level. Consider three p's from studies with equivalent power: >p = .045, p = .055, and p = .55. Do we really want to report the >first simply as "p < .05" and the latter two as "p > .05"? >Certainly the difference between the two "nonsignificant" >results is greater than the difference between the "significant" >one and the "nearly significant" one. Good points. In the second case (p= .055), I tell my students to report that there is a trend toward significance or that the result is marginally significant and that replication with a larger sample size should decide whether the effect is "real" or reliable. I suggest the heuristic of "p> .10" as the criterion for results that are probably not of practical significance (that is, for situation where finding a significant result is not critical [such a critical situation would be finding a new treatment for AIDS]). > My recommendation is to provide an exact p (at least for p's >which are > .001) and a measure of strength of effect. With simple >comparisons, such as the t-tests mentioned in the original query, >why not present confidence intervals for the difference as well >(or instead of the "significance testing")? IMHO we might well >be better off just dropping "significance testing," at least for >the evaluation of effects which can be stated in terms >of relatively simple estimators, that is, statistics about which >we can easily place confidence intervals (estimates of means, >differences between means, correlation coefficients, etc.). I am pretty much in agreement with you. It should be noted that an APA task force on the use of significance testing has been or will be enpaneled (my memory is a bit vague on this but I remember reading about it in Div. 5's newsletter "The Score"). In the next century we might see a significant change in the presentation of the results in journal articles. > Karl L. Wuensch, Dept. of Psychology, East Carolina Univ. > Greenville, NC 27858-4353, phone 919-328-4102, fax 919-328-6283 > Bitnet Address: PSWUENSC@ECUVM1 > Internet Address: PSWUENSC@ECUVM.CIS.ECU.EDU -Mike Palij/Psychology Dept/New York University ======================================================================== Date: Mon, 25 Mar 1996 09:08:23 +1000 (EST) To: TIPS@fre.fsu.umd.edu From: reece@rmit.edu.au (John Reece) Subject: Re: APA Style/Probability >I'm about to deal with SPSS t tests in my stats/methodology class, and I >wanted to pose a question that bothered me from last semester: > >I suggest my students report exact p values (since it's more precise and >allowed in 4th ed. APA style), but how do you report a >computer-generated p value of .000? It would not be correct to say >p = .000, right? Is the best option here p < .001? > >Lorri Cerro e-mail: cerro@umbc.edu >Department of Psychology Office: 410-455-2322 >University of Maryland Baltimore County Phone Mail: 410-532-5705 Lorri, I applaud your instructing your students to report exact p levels. I have long argued against the imprecision of the "one star, two star" approach to reporting significance. My advice to students is exactly as you suggested. When the output is p = .000, report as p < .001. At that level, you're looking at something so small that a precise measurement is relatively meaningless anyway, athough I suppose you could rightly argue a meaningful difference between something that's significant at p = .0009, and something significant at p = .0000000000003. And it's for that very reason that I would suggest instructing your students to report a simple measure of effect size, which for t-test would be Cohen's d. Hope this helps. ************************************************************************** * John Reece, PhD * * Department of Psychology & Intellectual Disability Studies * * Royal Melbourne Institute of Technology * * Bundoora Campus * * PO Box 71 Phone: 061-03-9468-2512 * * Bundoora Victoria 3083 Fax: 061-03-9468-2303 * * AUSTRALIA Internet/Aarnet: reece@rmit.edu.au * ************************************************************************** ======================================================================== 24 Date: Sun, 24 Mar 1996 22:50:34 EST Sender: "Psychology department faculty, staff and student list" From: tw Subject: Re: Exact p-values To: Multiple recipients of list ECUPSY-L I also prefer exact p-values. I also remember reading an article where the meta-analyst implored researchers to use exact p-values. One worry I have about dropping significance testing and substituting confidence levels has to do with communication. I wonder if reporting confidence intervals wouldn't make it difficult for the reader to follow a results section. Maybe it is just from a lack of practice. However, it might be that the significance test makes it easier to tell a story. Tony Whetstone ======================================================================== 80 Date: Tue, 26 Mar 1996 15:16:17 EST Sender: "Psychology department faculty, staff and student list" From: "Karl L. Wuensch" Subject: Re: Exact p-values Karl A. Minke opined: > At least when one has rejected the null, the exact p-value does >convey some information--the probability that one made an error when doing >so. The p-value when one fails to reject the null is meaningless, however. Treating the exact p-value as "the probability that one made an error when " unfortunately is a very common error. One cannot make such an error (a Type I error) unless the null hypothesis is false, so to determine the probability having made such an error one must factor in the probability that the null hypothesis is true. Of course, one is not going to be able to quantify that probability in real situations, but one can expect that psychologists frame their hypotheses such that the probability of the null hypothesis being true (or even near to true) is quite low. I recommend the article "On the Probability of Making Type I Errors" by Pollard and Richardson, Psychological Bulletin, 1987, 102: 159-163 for a thorough (but dense) discussion of this problem. They refer to the probability that one has made an error when rejecting a null hypothesis as the "conditional posterior probability of making a Type I error." It is this probability which is commonly but mistakenly assumed to be equal to alpha or p. One related confusion is the assertion that using the .05 criterion will result in your making a Type I error 5% of the time you test a null. It should be clear that this is not so, this would be so only if every null hypothesis ever tested was true. Some have even written that the .05 criterion means that 5% of published rejections of the null are Type I errors. While this could be true (one need consider publication policy, the unconditional probability that a null hypothesis is true, and levels of power: under certain circumstances the Type I error rate could equal 5%), it is highly likely that the rate of Type I errors in the literature is extremely small, well below 5%. Of course, one could argue that no point null hypothesis is ever absolutely true, or the probability of such is quite small, but I prefer to think of "range" or "loose" null hypotheses of the form that the effect is zero or so close to zero that it might as well be zero for practical purposes. I also disagree with Karl's statement that p is totally uninformative when its value exceeds .05 or some other magical criterion of "significance." I prefer to treat p as an index of how well the data fit with the null hypothesis. High values of p indicate that the observed data are pretty much what you would expect given the null hypothesis. Low values of p indicate that the obtained sample is unlikely given the null, and thus cast some doubt on the veracity of the null, even if the p is not at or below the criterion of significance. I am reminded of a criminal case on which I was a juror. After evaluating the data my p was above my criterion of significance (I voted 'not guilty'), but not by much -- I thought it more likely that the defendant was guilty than innocent, but not "beyond a reasonable doubt." "Not guilty" is not the same as "innocent." When p = .055 I remain distrustful of the null hypothesis, even if I have not rejected it. When p = .55 I am much more comfortable with the null hypothesis, especially if my power was high. Let me share a couple of lines from the excellent article, "Statistical Procedures and the Justification of Knowledge in Psychological Science," by Rosnow and Rosenthal, which appeared in the American Psychologist in October of 1989: "surely, God loves the .06 nearly as much as the .05. "Can there be any doubt that God views the strength of evidence for or against the null as a fairly continuous function of the magnitude of p?" "there is no sharp line between a 'significant' and a 'nonsignificant' difference; significance in statistics, like the significance of a value in the universe of values, varies continuously between extremes."