East Carolina University
Department of Psychology


The SUM, MEAN, and NMISS Functions in SAS and SPSS


    The SUM function in SAS and in SPSS sounds like a great tool to use for scoring an additive scale. It is, but be careful if you have missing data. If you are using SAS, look at the below programs, output, and explanations. If you are a SPSS user, jump to an example using SPSS.

SAS

SUM(OF var list)

------------------------------- program 1----------------------------
data sol; input x1-x3; sum =
SUM(of x1-x3); cards;
1 2 3
1 . 3
1 . .
. . .
proc print;
-------------------------------- output -----------------------------
OBS    X1    X2    X3   SUM

1      1     2     3     6
2      1     .     3     4
3      1     .     .     1
4      .     .     .     .
-------------------------------- warning ----------------------------

    Notice that missing values are treated as zeros by the SUM function, unless the observation is missing on all of the variables in the list.

IF NMISS(OF var list) > THEN

------------------------------- program 2 ---------------------------
data test; input (x1-x3) (1.); tot1 = sum(of x1-x3); tot2 = x1+x2+x3;
IF
NMISS(OF X1-X3) > 0 THEN TOT3 = . ; ELSE TOT3 = SUM(OF X1-X3);
mx =
MEAN(of x1-x3); tot4 = 3 * mx; cards;
121
212
21
2
proc print;
-------------------------------- output -----------------------------
OBS  X1  X2  X3  TOT1  TOT2  TOT3  MX   TOT4

1    1   2   1    4     4     4   1.33  4.0
2    2   1   2    5     5     5   1.67  5.0
3    2   1   .    3     .     .   1.50  4.5
4    .   .   2    2     .     .   2.00  6.0
------------------------------ explanation --------------------------
    Notice that observations 3 and 4 have missing data. The variable tot1, created with the SUM OF function, returns nonmissing sums for those observations, even though they have missing data -- SUM OF simply treated missing values as zeros. The more cumbersome tot2=X1+X2+X3 treated the missing data the way I wanted SUM OF to, but if you had responses to a lot of questions it would be very cumbersome to write a statement like the tot2 statement above. Now look at how the NMISS function can be used with SUM OF to achieve the desired results: IF NMISS(OF X1-X3) > 0 THEN TOT3 = . ; ELSE TOT3 = SUM(OF X1-X3); The IF...THEN sets TOT3 to missing if any of the variables in the list has missing data.

VAR = NMISS(OF var list)

    In some cases it may be reasonable to replace missing values with the mean of the subject's responses on the nonmissing items in the list. Computing the MEAN of the nonmissing values and then multiplying this mean by the number of variables in the list (as done with tot4 in the program above) will give you such estimated totals for subjects with missing data.  You will probably need first to decide how many missing data points a case can have and still be retained.  For example, suppose we had a 119 item survey and we want to know, for each subject, how many missing data points there are:

data kj; infile 'D:\StatData\KJ.dat'; input (q1-q119)(1.);
diff=q3-q68; adiff=abs(diff); miss=
NMISS(of q1-q119);
if q119 = . then X = 1; else X = 0;
proc freq; table miss; run;

 

                                               Cumulative    Cumulative

          miss    Frequency     Percent     Frequency      Percent

          --------------------------------------------------------

             0         139       90.26           139        90.26

             1           9        5.84           148        96.10

             2           2        1.30           150        97.40

             3           1        0.65           151        98.05

            11           1        0.65           152        98.70

            13           1        0.65           153        99.35

            16           1        0.65           154       100.00

    Now you decide where to draw the line.  Suppose that I decide that I'll keep any case that has not more than 3 missing data points.  Furthermore, suppose that I want to compute for each case the sum of responses on all 119 items.  For those few items that are missing on 12 cases, I will substitute the mean response for that case on the remaining items:

VAR = MEAN(OF var list)

data cull; set kj; if miss < 4;
mean_tot =
MEAN(of q1-q119);
sum_tot = 119*mean_tot;
proc means; var mean_tot sum_tot; run;

  Variable     N           Mean        Std Dev        Minimum        Maximum

  -------------------------------------------------------------------------

  mean_tot   151      2.8861920      0.2374971      2.1008403      3.7394958

  sum_tot    151    343.4568531     28.2621494    250.0000000    445.0000000

  -------------------------------------------------------------------------


SPSS

Compute VAR = SUM(var list)

    Using the same data as in program 1 above, SPSS' (click Transform, Compute  if you are not typing syntax) COMPUTE SUM1 = SUM(X1 TO X3) produced exactly the same results as did SAS.

Compute VAR = SUM(var list)

    Using the same data as in program 2 above, SPSS' COMPUTE TOT1 = SUM(X1 to X3) produced the same results as SAS' SUM(OF X1-X3) -- that is, missing data were treated as scores of zero unless the case had no valid data, in which case the sum was set to the missing value. In SPSS, COMPUTE TOT2 = X1+X2+X3 produced the same results as SAS' TOT2 = X1+X2+X3 -- that is, any case with one or more missing data points produced a missing value for the sum.

Compute VAR = NMISS(var list)

    To get a distribution of the number of missing items on a set of variables, use the NMISS function -- for example, COMPUTE ARmiss = NMISS(ar1 to ar2).  Then use the Frequencies procedure to obtain the frequency distribution for number of missing items.

 

Compute VAR = SUM.#(var list)

   SPSS provides a convenient way to modify the SUM function so that it will return a missing value if the case does not have a minimum number of nonmissing scores for the variables that are the argument of the SUM function. All you need to do is add a period after the SUM and follow it with the minimum number of nonmissing scores. For example, for the data most recently discussed, COMPUTE SUM.3(X1 TO X3) produced totals of 4 and 5 for the first two cases (which had no missing data), but missing values for the second two cases (each of which had fewer than three nonmissing scores). COMPUTE SUM.2(X1 to X3) produced totals of 4, 5, 3, and missing for the four cases.

Compute VAR = MEAN.#(var list)

    Suppose you decide you want to replace missing values with the mean of the subject's responses which were not missing, unless the subject had fewer than two nonmissing responses. Start with COMPUTE MEAN = MEAN.2(X1 TO X3). You get means of 1.33, 1.67, 1.5, and missing. Pretty cool! If you just must get back to sums, all you have to do is multiply these means by the number of arguments, 3 in this case. COMPUTE TOT5 = 3*MEAN produces totals of 4, 5, 4.5, and missing.

    Here is an example from the research of one of my graduate students.  She wants to score a ten-item scale.   The item variables are names rse1 to rse10. After investigating the distribution of missing values, she decides to compute an score for any case which has at least nine nonmissing values.  The scale score will be computed as a total (sum).  For each subject with at least nine nonmissing values, she will compute the mean item score and then multiply it by ten.  She has already recoded those items that need to be reverse-scored.  From the data view, Anne clicks Transform, Compute.  She names the target variable RSE.  In the Numeric Expression box she enters "10*MEAN.9(rse1 to rse10)" and then she clicks OK.  The scale score is created.  I generally prefer to compute means rather than sum, so I would omit the "10*".

snake on a stick

Back to the SAS Help Page

Back to the SPSS Lessons Page


Contact Information for the Webmaster,
Dr. Karl L. Wuensch


This page most recently revised on 9-10-11.