East Carolina University

Department of Psychology

The SUM function in SAS and in SPSS sounds like a great tool to use for scoring an additive scale. It is, but be careful if you have missing data. If you are using SAS, look at the below programs, output, and explanations. If you are a SPSS user, jump to an example using SPSS.

**
-------------------------------
program 1----------------------------
data sol; input x1-x3; sum =
SUM(of x1-x3); cards;
1 2 3
1 . 3
1 . .
. . .
proc print;
-------------------------------- output -----------------------------
OBS X1 X2 X3 SUM
1 1 2 3 6
2 1 . 3 4
3 1 . . 1
4 . . . .
-------------------------------- warning ----------------------------
**

Notice that missing values are treated as zeros by the SUM function, unless the observation is missing on all of the variables in the list.

IF NMISS(OF var list) > THEN
-------------------------------
program 2
---------------------------

data test; input (x1-x3) (1.); tot1 = sum(of x1-x3); tot2 = x1+x2+x3;

IF
NMISS(OF X1-X3) > 0 THEN TOT3 = . ; ELSE TOT3 =
SUM(OF X1-X3);

mx =
MEAN(of x1-x3); tot4 = 3 * mx; cards;

121

212

21

2

proc print;

-------------------------------- output -----------------------------

OBS X1 X2 X3 TOT1 TOT2 TOT3 MX TOT4

1 1 2 1 4 4 4 1.33 4.0

2 2 1 2 5 5 5 1.67 5.0

3 2 1 . 3 . . 1.50 4.5

4 . . 2 2 . . 2.00 6.0

------------------------------ explanation --------------------------

Notice that observations 3 and 4 have missing data. The variable tot1, created with the SUM OF function, returns nonmissing sums for those observations, even though they have missing data -- SUM OF simply treated missing values as zeros. The more cumbersome tot2=X1+X2+X3 treated the missing data the way I wanted SUM OF to, but if you had responses to a lot of questions it would be very cumbersome to write a statement like the tot2 statement above. Now look at how the NMISS function can be used with SUM OF to achieve the desired results: IF NMISS(OF X1-X3) > 0 THEN TOT3 = . ; ELSE TOT3 = SUM(OF X1-X3); The IF...THEN sets TOT3 to missing if any of the variables in the list has missing data.

** ** In some cases it may be reasonable to replace missing values with the mean of the subject's responses on the nonmissing
items in the list. Computing the MEAN of the nonmissing values and then multiplying this mean by the number of variables in the list (as done with tot4 in the program above) will give you such estimated totals for subjects with missing data.
You will probably need first to decide how many missing data points a case can
have and still be retained. For example, suppose we had a 119 item survey
and we want to know, for each subject, how many missing data points there are:

data kj; infile 'D:\StatData\KJ.dat'; input
(q1-q119)(1.);

diff=q3-q68; adiff=abs(diff); miss=NMISS(of q1-q119);

if q119 = . then X = 1; else X = 0;

proc freq; table miss; run;

Cumulative Cumulative

miss Frequency Percent Frequency Percent

--------------------------------------------------------

0 139 90.26 139 90.26

1 9 5.84 148 96.10

2 2 1.30 150 97.40

3 1 0.65 151 98.05

11 1 0.65 152 98.70

13 1 0.65 153 99.35

16 1 0.65 154 100.00

Now you decide where to draw the line. Suppose that I decide that I'll keep any case that has not more than 3 missing data points. Furthermore, suppose that I want to compute for each case the sum of responses on all 119 items. For those few items that are missing on 12 cases, I will substitute the mean response for that case on the remaining items:

data cull; set kj; if miss < 4;

mean_tot = MEAN(of q1-q119);

sum_tot = 119*mean_tot;

proc means; var mean_tot sum_tot; run;

Variable N Mean Std Dev Minimum Maximum

-------------------------------------------------------------------------

mean_tot 151 2.8861920 0.2374971 2.1008403 3.7394958

sum_tot 151 343.4568531 28.2621494 250.0000000 445.0000000

-------------------------------------------------------------------------

**Compute VAR = SUM(var list)**

Using the same data as in program 1 above, SPSS' (click Transform, Compute if you are not typing syntax) COMPUTE SUM1 = SUM(X1 TO X3) produced exactly the same results as did SAS.

**Compute VAR = SUM(var list)**

Using the same data as in program 2 above, SPSS' COMPUTE TOT1 = SUM(X1 to X3) produced the same results as SAS' SUM(OF X1-X3) -- that is, missing data were treated as scores of zero unless the case had no valid data, in which case the sum was set to the missing value. In SPSS, COMPUTE TOT2 = X1+X2+X3 produced the same results as SAS' TOT2 = X1+X2+X3 -- that is, any case with one or more missing data points produced a missing value for the sum.

**Compute VAR = NMISS(var list)**

To get a distribution of the number of missing items on a set of variables, use the NMISS function -- for example, COMPUTE ARmiss = NMISS(ar1 to ar2). Then use the Frequencies procedure to obtain the frequency distribution for number of missing items.

**Compute VAR = SUM.#(var list)**

SPSS provides a convenient way to modify the SUM function so that it will return a missing value if the case does not have a minimum number of nonmissing scores for the variables that are the argument of the SUM function. All you need to do is add a period after the SUM and follow it with the minimum number of nonmissing scores. For example, for the data most recently discussed, COMPUTE SUM.3(X1 TO X3) produced totals of 4 and 5 for the first two cases (which had no missing data), but missing values for the second two cases (each of which had fewer than three nonmissing scores). COMPUTE SUM.2(X1 to X3) produced totals of 4, 5, 3, and missing for the four cases.

**Compute VAR = MEAN.#(var list)**

Suppose you decide you want to replace missing values with the mean of the subject's responses which were not missing, unless the subject had fewer than two nonmissing responses. Start with COMPUTE MEAN = MEAN.2(X1 TO X3). You get means of 1.33, 1.67, 1.5, and missing. Pretty cool! If you just must get back to sums, all you have to do is multiply these means by the number of arguments, 3 in this case. COMPUTE TOT5 = 3*MEAN produces totals of 4, 5, 4.5, and missing.

Here is an example from the research of one of my graduate students. She wants to score a ten-item scale. The item variables are names rse1 to rse10. After investigating the distribution of missing values, she decides to compute an score for any case which has at least nine nonmissing values. The scale score will be computed as a total (sum). For each subject with at least nine nonmissing values, she will compute the mean item score and then multiply it by ten. She has already recoded those items that need to be reverse-scored. From the data view, Anne clicks Transform, Compute. She names the target variable RSE. In the Numeric Expression box she enters "10*MEAN.9(rse1 to rse10)" and then she clicks OK. The scale score is created. I generally prefer to compute means rather than sum, so I would omit the "10*".

Contact Information for the Webmaster,

Dr. Karl L. Wuensch

This page most recently revised on
9-10-11.