Effect of n1/n2 on Estimated d and rpb

When comparing two means, the most commonly employed effect size estimators are g (estimated d) and the point-biserial r.  Each has it advocates and its critics.  One of the factors that one should consider when choosing which to employ is the effect of disparate sample sizes on the two estimators, which I illustrate below.

Equal Sample Sizes

First we look at an analysis of two samples where the sample sizes are equal.

T-TEST GROUPS=A(1 2)
/MISSING=ANALYSIS
/VARIABLES=Y1
/CRITERIA=CI(.9500).

 Group Statistics A N Mean Std. Deviation Std. Error Mean Y1 1 20 5.5000 2.30560 .51555 2 20 7.8000 2.30560 .51555

 Independent Samples Test t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Y1 Equal variances assumed -3.155 38 .003 -2.30000

Notice that the two means differ by one standard deviation (2.3).  That is, estimated d = 1.00, a large effect (Cohen's benchmark for a large effect was d = .8).

Now we compute the point biserial.

CORRELATIONS
/VARIABLES=Y1 WITH A
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

 Correlations A Y1 Pearson Correlation .456** Sig. (2-tailed) .003 N 40 **. Correlation is significant at the 0.01 level (2-tailed).

• The value of the point biserial falls just short of Cohen's general benchmark for a large r, .50.

• If we square the point biserial to get a proportion of variance, we obtain 20.8%.  This is absolutely equivalent to eta-squared in ANOVA.  Cohen's benchmark for a large eta-squared is 14%.

No matter how we look at it, our effect here is large.

(Very) Unequal Sample Sizes

Now let us look at the analysis on a data set where the sample sizes differ considerably.  The standard deviations and the mean differ very little from those in the first data set.

T-TEST GROUPS=B(1 2)
/MISSING=ANALYSIS
/VARIABLES=Y2
/CRITERIA=CI(.9500).

 Group Statistics B N Mean Std. Deviation Std. Error Mean Y2 1 100 5.5000 2.25854 .22585 2 4 7.7750 2.24109 1.12055

 Independent Samples Test t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Y2 Equal variances assumed -1.976 102 .051 -2.27500

The means still differ by one standard deviation -- estimated d = 1.01, a large effect (Cohen's benchmark for a large effect was d = .8).

Now we compute the point biserial.

CORRELATIONS
/VARIABLES=Y2 WITH B
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

 Correlations B Y2 Pearson Correlation .192 Sig. (2-tailed) .051 N 104

• The value of the point biserial puts it in Cohen's small (.1) to medium (.3) range.

• If we square the point biserial to get a proportion of variance, we obtain 3.7%.  This is absolutely equivalent to eta-squared in ANOVA.  This is in Cohen's small (1%) to medium (6%) range.

Although the estimated d indicates we have a large effect, the point biserial indicates that we have a small to medium effect.

How is the ratio of sample sizes having such an effect on the value of the point biserial?

• The point biserial r is the standardized slope for predicting the outcome variable from the grouping variable.
• Code the grouping variable with consecutive integers (I'll use 1 and 2).
• The unstandardized slope is the simple difference between group means.
• We standardize by multiplying by the standard deviation of the grouping variable and dividing by the standard deviation of the outcome variable.
• The standard deviation of the grouping variable is a function of the sample sizes.  For example, for N = 100, the SD of the grouping variable is
• .503 when n1, n2 = 50, 50
• .473 when n1, n2 = 67, 33
• .302 when n1, n2 = 90, 10
• Notice that the standard deviation of the grouping variable (and thus the point biserial r) decreases as the ratio of sample sizes departs from 1.

If you would like to read more about the differences between estimated d and the rpb as estimators of effect size, I recommend the following article:  McGrath, R. E., & Meyer, G. J.  (2006).  When effect sizes disagree:  The case of r and d. Psychological Methods, 11, 386-401.  Click here to return to Dr. Wuensch's Stat Help Page. Contact Information for the Webmaster,
Dr. Karl L. Wuensch