SS-Type.txt
Karl has hardcopy docs on types of sums-of-squares, see him if you would like
such. There are some example programs online here and below is copy of some
correspondence on the topic.
========================================================================
Sender: "SAS(r) Discussion"
From: Mark Lee
first I would like to thank Karl Wuensch, Wang Kuo-chang, Tim Dorcey, and
Mik Bickis, for their advice on Greenhouse-Geiser and Huyn-Feldt adjust-
ments on repeated measures analysis. What I understand as the gist of
those answers was that if a variable has only 2 levels, the covariance of
the difference vector is of course equal to itself, and there is no ad-
justment needed. That makes perfect sense to me. What does not exactly
make sense is that in looking at an interaction of 2 variables, one with
2 levels and another with say 3 or 4, there is nocorrected probability
given. Unfortunately for those who answer me I have another question. I
Am doing a GLM anova - A * B * C * SUBJECTS (4 * 3 * 4 * 32 subjects) and
I have factors B and C as within subjects variables. Also unfortunately
I am missing data for 2 subjects each at different levels of variable C.
The repeated or multivariate mode is said to drop these cases altogether.
I set up my data in split plot (I hope that is the proper term) fashion
using the SUBJECTS(A) * B * C term as my largest interaction's error term
In other words I specified the entire model, and took up all the D.F.
GLM gave me type I and Type III sums of squares for my tests (each of I
specified) that were different from the results I got on the same analyse
s from proc anova. My questions are: Since I have missing data I want
to know which type are most appropo for my problem. Further I must say
that I have read the section on the four estimable types of functions 4
times am am still confused as to what the real differences are. type I
seem to be sequential sums of squares, and type two seem to be unique but
I can seem to make out what they are saying. Perhaps it's a function of
how dense I am. Type 3 and four are even more obscure. Also what is the
adjustment made for the missing values? I can't really tell from the doc
umentation.
Any help would be greatly appreciated. Thank you in advance - Truly
Mark D. Lee
========================================================================
From: "Karl L. Wuensch"
To: Mark Lee
In-Reply-To: Your message of Thu, 18 Apr 91 14:00:14 EDT
Regarding the interaction between a two-level within-subjects factor and
a between-subjects factor, sphericity can't be a problem because the
interaction boils down to a between-subjects test on a single difference
score -- this is why you get no corrected p's for such terms.
Heterogeneity of variance (of the difference scores across levels of the
between-subjects factor) could be problem.
Regarding missing data in your design and Types of sums of squares,
I recommend that you either delete the subjects who are missing data (as
SAS would by default) or use a multiple regression procedure to predict
for each subject what his/her score would be based on a model using as
predictors the variables on which you do have data for the subject. This is
a lot of work for a little data, so you'd better want it bad to do this. A
lazy person might just substitute the cell mean for such missing data. There
is a type of SS that purports to be appropriate for designs with missing data -
- Type IV -- but in my mind these Type IV SS are more in the domain of magic
than statistics, I don't recommend their use.
On the more general topic of SAS's Types of sums-of-squares, Type I are
"sequential," "hierarchical," or "stepdown," that is, each effect is partialled
for preceding (to its left) effects. You probably will not often want to use
these, unless you are doing something like a covariate analysis or a trend
analysis where you entered the powers of the quantitative predictor(s). SAS's
Type I SS are the same as Overall & Spiegel's (1969, Psychol. Bull., 72, 311-
322) Method III and SPSS' "METHOD=SSTYPE(SEQUENTIAL)" in MANOVA, but not the
same as SPSS "METHOD=HIERARCHICAL" in ANOVA.
SAS' Type III SS partial each effect for every other effect in the model
- they are the same as Overall and Spiegel's Method I, the "METHOD=UNIQUE" in
SPSS ANOVA, and the default in SPSS MANOVA. They are the sums-of-squares that
are approximated by the classic "unweighted means" analysis of nonorthogonal
factorial designs. This is probably the type of SS you most often want. I
recommend an article by Howell and McConaughy (1982, Educ. & Psychol.
Measurement, 42, 9-24) on the topic.
Type II SS for ANOVA are strange -- I could tell you what they are, but
since it is highly unlikely that you will ever want them in an ANOVA, I'll
save us both time and not say more about them.
Hope this helped. Ciao,
Karl L. Wuensch, Psychology, East Carolina Univ.
========================================================================
Date: Sat, 06 Dec 1997 17:13:20 -0500
Sender: owner-edstat-l@eos.ncsu.edu
From: Donald Macnaughton
To: STAT-L@VM1.MCGILL.CA, edstat-l@jse.stat.ncsu.edu
Subject: Which Sums of Squares Are Best in Unbalanced ANOVA?
If you are not already confused enough by different types of sums of
squares in ANOVA, check out the paper referenced below. You'll need the
Adobe Acrobat reader to read the downloaded document.
----------------------------Original message----------------------------
Many readers will recall the recent controversy about methods for
computing sums of squares in unbalanced analysis of variance.
Many statisticians believe this controversy has been settled,
with the conclusion being that the sums of squares commonly known
as "SAS Type III" or "SPSS ANOVA Unique" are appropriate in most
cases.
I have written a paper that disagrees with this conclusion. The
paper proposes that the sums of squares known as "SAS Type II" or
"SPSS ANOVA Experimental" are appropriate in most cases. The
reasoning is an extension of earlier writers' reasoning about the
hypotheses being tested. Here is the abstract:
-----------------------------------------------------------------
Which Sums of Squares Are Best
In Unbalanced Analysis of Variance?
ABSTRACT
Three fundamental concepts of science and statistics are enti-
ties, variables (which are formal representations of properties
of entities), and relationships between variables. These con-
cepts help to distinguish between two uses of the statistical
tests in analysis of variance (ANOVA), namely
- to test for relationships between the response variable and the
predictor variables in an experiment
- to test for relationships among the parameters of the model
equation in an experiment.
Two methods of computing ANOVA sums of squares are
- Higher-level Terms are Omitted from the generating model equa-
tions (HTO = SPSS ANOVA EXPERIMENTAL -= SAS Type II -= BMDP4V
with Weights are Sizes, where -= signifies "approximately
equals")
- Higher-level Terms are Included in the generating model equa-
tions (HTI = SPSS ANOVA UNIQUE = SPSS MANOVA UNIQUE = SAS Type
III = BMDP4V with Weights are Equal = BMDP2V = MINITAB GLM =
SYSTAT MGLH = Data Desk Type 3).
This paper evaluates the HTO and HTI methods of computing ANOVA
sums for squares for fulfilling the two uses of the ANOVA statis-
tical tests. Evaluation is in terms of the hypotheses being
tested and relative power. It is concluded that (contrary to
current practice) the HTO method is generally preferable when a
researcher wishes to test the results of an experiment for evi-
dence of relationships between variables.
-----------------------------------------------------------------
The paper contains 22,000 words and 105 references. It is avail-
able at
http://www.matstat.com/ss.htm
--------------------------------------------------------
Donald B. Macnaughton MatStat Research Consulting Inc.
donmac@matstat.com Toronto, Canada
--------------------------------------------------------