ANOVA2-Followup.txt

     When we went over the ANOVA2 LISTING in class, which replicated and
extended the factorial analysis done in Howell, we noticed that the cell
standard deviations differed considerably.  Here they are:
 ------------------------------------------------------------------------------
                   2-WAY, EQUAL NS, INDEPENDENT SAMPLES ANOVA
                         Howell, 4th edition, page 404
 
          Level of   Level of           ------------ITEMS------------
          AGE        CONDITON       N       Mean              SD
 
          Old        Adjective     10     11.0000000       2.49443826
          Old        Counting      10      7.0000000       1.82574186
          Old        Imagery       10     13.4000000       4.50185147
          Old        Intentional   10     12.0000000       3.74165739
          Old        Rhyming       10      6.9000000       2.13177026
          Young      Adjective     10     14.8000000       3.48966729
          Young      Counting      10      6.5000000       1.43372088
          Young      Imagery       10     17.6000000       2.59058123
          Young      Intentional   10     19.3000000       2.66874919
          Young      Rhyming       10      7.6000000       1.95505044
 
     Note that the largest cell variance, 4.50**2 = 20.24, is almost
 ten times as large as the smallest cell variance, 1.43**2 = 2.04.
 
-------------------------------------------------------------------------
 
     I edited the ANOVA2 listing, cutting out all but output showing the
ten cell means and standard deviations.  I then read these into Minitab to
investigate the relationship between the means and the standard deviations.
Here is what I found.
 
 MTB > name c1 'mean' c2 'stdev' c3 'var'
 MTB > let c3=c2*c2
 MTB > corr c1 c2
 
 Correlation of mean and stdev = 0.538
 
 MTB > corr c1 c3
 
 Correlation of mean and var = 0.455
 
 MTB > plot c1 c2
 
          -
      20.0+
          -                         *
  mean    -                        *
          -
          -
      15.0+                                       *
          -
          -                                                        *
          -                                           *
          -                       *
      10.0+
          -
          -              *
          -     *     *     *
          -
       5.0+
            +---------+---------+---------+---------+---------+------stdev
 
         1.20      1.80      2.40      3.00      3.60      4.20
 
 
     The cell means appear to be well correlated with both the cell standard
deviations and the cell variances, suggesting a log or a sqrt transformation
(as suggested by Howell back in chapter 11).  I went back to the ANOVA2 SAS
program and modified the data step to do the log transformation:
"INPUT AGE CONDITON; DO I=1 TO 10; INPUT ITEMS @@;
 logitems=log10(items); OUTPUT; END;"
 
and then checked the cell variances.  The ratio of the largest to the
smallest was a bit over 6, so I changed the transformation to
"sqritems=sqrt(items);" which brought the ratio down to a hair over 4, which,
given equal sample sizes, seemed adequate.  Of course, I should also check
the effect of this transformation on the shape of the within cells
distributions, but the n's are small and these are not my research data, so
I haven't.
 
     If these were your research data, for your thesis or whatever, you
should, of course, carefully evaluate them for data entry errors, violations
of assumptions, outliers, etc. prior to starting an analysis such as ANOVA.
I probably have not put enough emphasis on such preliminary data checking,
perhaps because it is not really very exciting, but it is important.  I'll
try to spend more time on it next semester in PSYC 6433.