East Carolina University
Department of Psychology


PSYC 7433 Research Paper Errors


Here is a list of errors made by past students on the required research presentation in PSYC 7433.

Programming Errors

SAS Formats: You should use PROC FORMAT to label the values of categorical variables which were coded with numeric values -- otherwise Karl will have a hard time understanding your statistical output -- but remember that PROC FORMAT only creates formats, it does not apply them to variables. You need to use the FORMAT statement (usually in your data step) to apply the formats to specific variables. See my Proc Format Help File.

Variable Names: You should give your variables descriptive names, like "gender," "attract," and "ethnic" rather than "A," "B," and "C" -- otherwise Karl will have a hard time understanding your statistical output.

Tabs in a Text Data File:  If you mistake tabs for blank spaces and use blank spaces as the delimiter, SAS will incorrectly read the data and your analysis will be corrupt.  This is a fatal error, that is, results in the paper being returned to the student with a request to correct the problem, do the analysis over, and write the results and discussion sections over.  Here is how to check for tabs in a text data file.  Bring the file into Word.  Toggle Show/Hide to Show.  You will see a dot for each blank space and a right-pointing arrow for each tab.  You need to replace the tabs with blank spaces.  Click Edit, Replace, More, Special.  Select "Tab Character."  Put a blank space in the "Replace with" field.  Click "Replace All."  Save the file as a plain text file.

Statistical Problems

Categorical Variables in Logistic Regression: A categorical variable, such as race, which has more than two levels, needs to be dummy coded, like we did with the scenario variable in the class example of logistic regression. Otherwise it is treated as a continuous variable.

Categorical Variables in Multiple Regression: A categorical variable, such as "which celebrity's body is closest to your ideal body: Sarah Michelle Gellar, Jennifer Lopez, Venus Williams, or Catherine Manheim," needs to be dummy coded. Otherwise it is treated as a continuous variable.

Cell Sample Sizes:  When conducting factorial MANOVA, be sure that you have at least five scores in each cell.  You absolutely must have at least two scores in a cell to be able to compute a within-cell variance.  Once a student did a factorial ANOVA with data that left one of her cells empty.  This invalidated the ANOVA.  She could have avoided this problem by combining levels of one of the predictors.  That predictor was race (White, Black, Other).  She could have combined "Black" and "Other" or "White" and "Other" and avoided this problem.  Alternatively, she could have just dropped the "Other" group.

Comparing Correlation Coefficients -- If you predict that one variable will be a better predictor of a criterion variable than will another, then you need to employ a test of the significance of the difference between the two overlapping, correlated correlations.  See Comparing Correlation Coefficients, Slopes, and Intercepts.

Comparing Apples with Oranges -- Imagine that you have two groups, one that is the adult children of alcoholics and the other is adult children of nonalcoholics. You wish to compare them on three variables: Average number of alcoholic beverages consumed per day, a physiological measure of stress, and number of visits to a physician in the last year. A MANOVA or a DFA would be appropriate here, comparing the two groups on an optimally weighted combination of the three variates. A 2 (alcoholic parent or not) x 3 (alcohol consumption, stress, visits) would not be appropriate. The repeated measures analysis would compare mean alcohol consumption with mean stress with mean visits to the doc -- but these variables are not measured on the same metric, and it does not really make any sense to compare them with one another.

Confidence Intervals for Standardized Effect Size Estimates -- you should report effect size estimates with confidence intervals.  For example, when reporting the results of a multiple regression, report the zero-order correlation between the criterion variable and each predictor and put a 95% confidence interval about that estimate.  Likewise, report the multiple R2 and a 90% confidence interval about it.  Those who conduct MANOVA or DFA, accompanied by univariate tests, should report estimates such as eta-squared and Cohen's d, and confidence intervals about these estimates.

Cronbach's Alpha -- Whenever you employ a summative scale (a score computed from summing or averaging responses to multiple item), you should report Cronbach's alpha for that scale, as computed using your subjects.

Data Screening and Assumptions of Your Analysis:  Test them. Point out any problems. Of course, those who use correlation matrix input rather than raw data can not and need not provide any information about the distributions of the variables.  For an ANOVA, I want to see cell standard deviations (in an Appendix if not in the body of the paper) so I can check the homogeneity of variance assumption.  For continuous variables I want to see in your statistical output estimates of skewness and kurtosis.  Of course, if you simulated your data using a random number generator which randomly samples scores from a normal population, and you specified constant variances, then you don't really need to worry about these assumptions.

Describe Your Effects: Don't just tell me that this or that had an effect, describe the effect to me. For example, don't just say that there was a significant main effect of dose of drug, give me the marginal means for the different dosage groups and tell me that as the dose went up the mean on the dependent variable went down (or whatever it did). Emphasize the direction of effect -- do not just say that this was significantly different from that, say that this was significantly greater than that. A Results section full of F and p statistics but with no means (or other appropriate descriptive statistics) is almost useless. You may find it helpful to put means in a table or plot.

Direction of Effect:  Do give the reader the information needed to determine the direction of effect. One student reported point biserial correlation coefficents (and related statistics, including some partialled for the effects of other variables in the model), but did not give the group means that would have make the direction of the effects obvious. Adding to the confusion, it appears that the dependent variable (score on a questionnaire) was measured in such a way that low scores mean the participant has more (rather than less) of the measured attribute. This is a dreadful way to score a questionnaire, and it is dreadfully common among psychologists. As a result of the lack of means and the lack of a clear presentation of the meaning of scores on this questionnaire, I ended up not being able to determine whether group A had more or less of the measured attribute than did group B.

Higher Order Interactions: They may be significant but trivial in magnitude. If so, it may be appropriate to dismiss them (especially if they don't make sense) and attend to more impressive main effects.

Ignoring Zero-Order Correlations.  This is one of the most common errors made by psychologists and one of my pet peeves.  The psychologist attempts to predict some outcome variable from a linear combination of predictors.  The predictors are highly correlated with one another, resulting in their being redundant with respect to their association with the outcome variable.  One or more of the predictors is well correlated with the outcome variables, as indicated by its zero-order correlation with the outcome variable.  Because of the redundancy among the predictors, the multiple regression shows that not one of the predictors has a significant beta weight.  The psychologist concludes that none of the predictors is related to the outcome variable, and I pull out more of my dwindling supply of hair.

Interpretation of Simple Main Effects, Monotonic Interaction: When the A x B interaction is monotonic, it may well be the case that the simple main effect of A is significant at every level of B. In that case, it almost certainly will help your presentation to note that the magnitude of the simple effect of A varies across levels of B (or vice versa) -- for example, the mean DV may increase from 10 to 15 when you move from A1B1 to A2B1, but increase from 20 to 40 (a much larger increase) when you move from A1B2 to A2B2 -- that is, the effect of moving from A1 to A2 is positive at both B1 and B2, but much larger at B2 than at B1. The use of interaction plots can be very helpful in presenting and interpreting interactions.

Magnitude of Effect: An effect can be significant but trivial in magnitude. There is a tendency to treat all significant effects as equal, but you should not necessarily do so. For example, in a factorial ANOVA look at the effects' sums of squares. If effect 1 is significant and its sum of squares is 50% of the total sum of squares and effect 2 is significant but its sum of squares is 1% of the total, then effect 1 is worthy of much more attention than is effect 2. When power is high, even trivial effects may be significant. Reporting eta-squared or omega squared may help. If a main effect is very large in magnitude, but the interaction which modifies it is quite small in magnitude (even though significant), it may be appropriate to emphasize the important of the big main effect.

Main Effects, Big, Ignoring Them:  The largest effects in an ANOVA should be described even if they participate in higher-order interactions.  Here is an example:  In a three way ANOVA the largest effects were the main effect of A (h2 = .58) and the main effect of B (h2 = .10.  Both were qualified by a triple interaction (h2 = .04).  The student appropriately conducted simple effects tests, but paid no attention to the big main effects.

Odds Ratios in Logistic Regression:  For each predictor you should report an odds ratio as a strength of partial effect of that predictor.  The odds ratio will also identify the direction of the effect, which, as always, should be emphasized when the effect is significant -- in this case, when the odds ratio is significantly different from one.

Pairwise Comparisons: These are not needed to explain a one df main effect that was significant in an ANOVA. In this case, you have only two groups, and the significant main effect tells you that the two group means differ significantly from one another.  They are needed when a main effect or simple main effect is significant and has more than one df.

Pairwise Comparisons, Error Term: When making pairwise comparisons on the means involved in one of the main effects (from a factorial analysis), you should generally not drop from the model the other factors -- doing so returns the variance explained by those other factors to the error term, resulting in a loss of power.

Pooled Error for Simple Effects.  Unless there is a problem with heterogeneity of variance, you should consider using pooled error, especially when the sample size is small for one of the tests of simple effects.  This year one student reported a simple, simple effect where the sample size was so small that with individual error there was only one error degree of freedom.  The p value was .32.  If she had used a pooled error term her p would have been .016.

Pooled t -- With unequal sample sizes, separate variances t is preferred to pooled t.

Reflection of Variables:  If you have reflected a variable, as part of a transformation to reduce negative skewness, it might be wise to re-reflect it after the transformation.  For example, one student reflected the criterion variable (percentage of persons in a county that had college degrees) and then correlated it with several predictor variables, including high school dropout rate.  He reported that the correlation between dropout rate and percentage of residents with a college degree was .32.  This makes it much too easy to think that dropping out of high school is positively associated with getting college degrees, when the relationship is actually negative.  Re-reflecting the criterion variable would have prevented this confusion.

Simple Main Effects Following a Significant Two-Way Interaction: If you have a significant two-way interaction that is not modified by a higher order interaction (and, in some cases, even if it is), you should present tests of simple main effects. For those tests, you may choose to calculate an F based on a pooled error term (from the omnibus analysis), or you may choose to report the F based on an individual error term (that reported by SAS when you sort BY one factor and analyze the other factor BY level of it), but be sure that the error df you report is correct for the type of error term you have chosen. Sometimes students report the individual error F with the pooled error df. Sometimes they report the total df as the error df. And sometimes they report the F based on the total model (cells) sums of squares rather than that based on the effect of interest (which is part of, but not all of, the total model).

Tests of Simple Interactions When Triple Interaction Was Not Significant: Some students have conducted tests of simple interactions when the triple interaction was not significant and there was no a priori reason to do so. Some of these same people neglected to do simple main effects analysis when that was appropriate. For example, suppose that in a three-way ANOVA the only significant effect was AxB. It would not be appropriate to look at the simple interaction between A and B at each level of C, but it would be appropriate to look at the simple main effects of A at each level of B. When you use SAS to get the effects of A at levels of B, you include in your model the effects of C and AxC, not because you need to test and report the simple effects of C and AxC, but simply to gain power by excluding those effects from the error term. Do not assume that you need to include in your Results section every F and p computed by SAS.

Too Much Power: Those who simulate data often make the mistake of selecting cell means that differ too much from one another relative to the too small within-cell standard deviations. This may be, in part, motivated by a desire to be sure of getting significance for the desired main effects and/or a two-way interaction, but may result in getting significance for unanticipated, trivial, and bothersome effects, especially higher-order effects. When this happens, the most reasonable course of action may be to resimulate the data with more reasonable cell means and standard deviations or with smaller N.

Significance -- Don't declare effects to be "significant" when p > .05 unless you have stated and justified a less conservative criterion of significance than the usual .05.

Suppression -- If you are reporting the results of multiple regression analysis, you should compare zero-order correlation coefficients with beta weights and note any suppression.

Unanticipated Results from Analysis of Simulated Data

         Pay close attention to the details of the analyses done on simulated data. Unless you are a real expert in simulating data, you should watch out for effects that you had not planned on including in your model, but which slipped in mistakenly. For example, one student simulated data for an ANCOV with two groups and one covariate. The unadjusted group means (ignoring the covariate) were simulated as she wanted them. The relationship between the covariate and the dependent variable was significant and in the direction she wanted. The ANCOV did indicate a significant effect of the grouping variable. The student used the unadjusted means to interpret this effect, unaware that a Reversal Paradox (suppressor effect) had caused the adjusted means (LSMEANS in SAS, which she did not obtain) to indicate a direction of effect opposite that with the unadjusted means. Imagine what a terrible error this would be in real life (it is sometimes called a Type III error), correctly reporting a significant result, but getting the direction of the effect backwards! If the student had obtained the adjusted means, she would, no doubt, have noticed the reversal, but she did not obtain those means. There was, however, in her output, a detail that, had she noticed it, would have tipped her off. The slope of the regression for predicting the dependent variable from the grouping variable alone was positive (and, of course, equal to the difference between the unadjusted group means), but the sign of the partial slope for the grouping variable in the ANCOV was negative (and, of course, equal to the difference between the adjusted group means).

Variance -- don't forget to report both means and standard deviations or variances.  If cell variances were homogeneous, you may elect to report the MSE and not the cell standard deviations.

APA Style and Grammar

    My thesis errors page covers a number of errors commonly made when writing a thesis or a research paper.  Don't you be making any of these errors !

Using Word

Pagination -- Use a page break (Ctrl Enter) rather than carriage returns to produce proper paging.

Hanging Indentation -- Use Ctrl-T to produce hanging indentation -- do not use a carriage return followed by indentation. See Word Tips.

 

Assorted Other Stuff

(#) -- One student's paper was peppered with parentheses within which was a one or two digit number .  My guess is that these are citations that were copied from a source that was being plagiarized, a source which was not APA-style.  Some journals reference sources by numbers rather than author names and date.

Careless Cutting & Pasting -- one student reported a significant Race of Defendant x Type of Crime x Attractiveness Level of the Defendant interaction, but none of these were factors in her research.  I suspect she copied a summary statement from one of my examples and then forgot to change the names of the factors.

Discussion Section -- Your discussion section should include the practical implications of your findings, if there are any.

Line Spacing -- The usual advice is to "set your word processor to double space and then forget it." Do not put a blank line between one paragraph and the next paragraph in an APA manuscript-- for example, put a carriage return at the end of the title line on the first page of the introduction, but don't enter a second carriage return before starting the first paragraph of the introduction.

Plagiarism:  All of the papers were checked for plagiarism (with software designed for that purpose).  One student's paper contained two sentences which were copied word for word from documents available on the Internet.

Reference List, Match with Text Citations -- Every reference in your reference list should correspond with a citation to that reference in the body of your manuscript, and the details (such as date of publication) must match.

Table-Text Redundancy -- if you have provided F, df, etc. in a table, is not generally necessary to include them in the text as well -- just refer to the table.

Reference List, None

Research Units -- who or what were they?  Were they people, rats, petunia plants, cities, counties, countries, or what?

Running Heads -- they do not appear on the title page of a final manuscript.  In a published article they appear at the top of every other page.  They are not the same as the short title that appears in the page header of a copy manuscript.

Source of Data -- When the data were downloaded from an Internet source, it is essential that you provide the url.

Title Page:  It should include the name of the author.

snake on a stick

Return to the Required Research Presentation Page

spider in web
Contact Information for the Webmaster,
Dr. Karl L. Wuensch


This page most recently revised on the 14th of November, 2014