East Carolina University
Department of Psychology


Validity of Student Ratings of Instruction


   Winning the Board of Governors' Excellence in Teaching Award led to my being asked to serve on the University's Teaching Effectiveness Committee. While serving on this committee we voted to recommend to the Faculty Senate that the S.O.I.S (the instrument we employ for student ratings of instruction) be administered in Summer school (currently it is administered only during Autumn and Spring semesters). This motivated me to review the literature on student ratings of instruction. When I went to my folder of articles on this topic, I found that they were all rather dated, in part because I have not paid much attention to this topic in many years, but also because there has been relatively little published on this topic in recent years. I was disappointed that that the folder did not include my reprints of recent work by Anthony G. Greenwald. Over the holiday break, I spent a couple of days trying to clear up the mess in my office. In one of the piles on my main desk I found the reprints of Greenwald's 1997 articles:

The American Psychologist is the flagship journal of the American Psychological Association, and the Journal of Educational Psychology is a highly respected APA journal, so these are top-notch articles. They had rested there on my desk for three years before I found time to read them! Here is my review of them:

   Greenwald's Research. In the JEP article, Greenwald presents the results of structural equations modeling done with data from student ratings of instruction administered to 200 undergraduate classes across three consecutive terms at the University of Washington. Two important characteristics distinguish Greenwald's methodology from that typical of past research:

   The best fitting model was one in which Grading Leniency (an inferred construct) led to students having Higher Expected Grades. Higher Expected Grades, in turn, led to:

   Greenwald did evaluate other possible interpretations of the fairly strong relationship between Expected Grades and Course Evaluations. Here is a brief review of those alternative models:

   In his AP article, Greenwald adds a fifth model:

   Between- versus Within- Classes Correlations. All of the models predict a positive correlation between Expected Grades and Course Ratings when computed between classes -- that is, classes where the mean expected grade is high tend to have high mean ratings. On the other hand, if these correlations are computed within classes, all of the models except the Teaching Effectiveness model predict this correlation to be positive. If you find a positive correlation, within a single class, between expected grade and course rating, that correlation cannot be due to the quality of instruction, since all students in a single class receive the same instruction. Thus, according to Greenwald, the within-class correlations, which are commonly reported to be positive, indicate that course ratings do not measure only the quality of instruction. It occurs to me, however, that the instruction in a course may well be better suited for some of the students in that course than for others, which should lead to the observed within-class correlation. I have long argued that the superior teacher uses a variety of methods, in hopes of making the instruction good for all types of students -- but you know that is a difficult task.

   Experimental Research. All of the research discussed so far has been nonexperimental. Greenwald does mention several experiments done in the 70's. In these experiments the students' grades were artificially manipulated up or down. Yes, these were done in actual classes. While I question the ethics of such research, I find the results interesting. In every one of these experiments, when grades were raised, so were ratings, and when grades were lowered, so were ratings.

   Convergent versus Discriminant Validity. Greenwald acknowledges that ratings have good convergent validity -- that is, they are correlated with other indicators of effective teaching. To me, the most convincing evidence comes from studies where different instructors teach the same course, all of the students take a common exam, and there are no obvious confounds (such as better students being attracted to one instructor). Meta-analysis of studies like this has indicated about a .4 correlation between course ratings and exam performance.

   Consequential Validity. What is the effect of course evaluations on our educational system? Greenwald sounds a warning here, and it is one you have heard from me. Earlier I argued that use of the S.O.I.S. could result in faculty having to choose between techniques that are pedagogically sound (like giving lots of homework and essay exams) and techniques that are associated with good course evaluations by students (like being easy and giving good grades to all). I argued that this is a concern even if the troublesome correlations (like that between good grades and good ratings) are myth only -- if faculty believe there is such a troublesome correlation, their behavior in the classroom may well be influenced by that belief. The research presented by Greenwald indicates that the grades-ratings correlation is no myth. Greenwald suggests that the teacher who wants to get ratings better than what e probably deserves has only to give an easy midterm exam -- but the teacher who wants e's students to learn more will give a hard midterm and "scare the students into studying." I prefer to motivate studying by means other than scaring the students, but must confess that the word around my department is that I can be pretty scary and that my exams are more than a little demanding.

   Greenwald's Solution. Greenwald opines that student ratings of courses are useful, but that they are contaminated by the effects of grading leniency. He recommends that course ratings be statistically adjusted to remove the effect of grading leniency. He also recommends adjustment to remove the effect of class size and any other contaminants that can be identified. I am not convinced that we can measure "grading leniency" sufficiently well to justify its use to adjust ratings.

   My Thoughts. I still believe that the best solution is to decrease reliance on the S.O.I.S by providing additional ways to evaluate quality of teaching, most notably peer review of teaching, and not just for those faculty going up for tenure. Additionally, I think that observation of our colleagues teaching should be on a "drop-in" basis rather than arranged before hand. When you advise a colleague that you will be observing e's class on a certain date, you should not expect a typical class on that date.

   Grades and Ratings at ECU. David Cartwright (11. Dec. 00) noted: "Last summer we did a brief study that looked at the relationship of SOIS scores and expected or actual grades. It showed little correlation, which dispels the commonly-held notion that the SOIS is a 'popularity contest.' We have intended to circulate this but have not had time to do so. I will send you a preliminary "draft" version." This may actually be a matter of concern. If the S.O.I S. measures quality of instruction, then there should be a positive correlation between expected grades and course ratings. Good teaching should lead to the students working more and learning more, which should lead them to expect better grades. Accordingly, if the ratings are correlated with the quality of teaching, then they should also be correlated with expected grades (between-classes but not within-classes).

Course Difficulty and Ratings at ECU.  In Autumn, 2006, I crunched SOIS ratings of faculty in Psychology to identify those who were repeatedly in the top quarter of the distribution and thus eligible for nomination for our department's teaching award.  One of my colleagues asked me to investigate the relationship between student rating of course difficulty (amount of work and difficulty of the course content) and student rating of instructor's teaching.  For each faculty member I computed the average rating on the teaching items and on rated difficulty, across students (medians) and across courses and semesters (means) for two years.  Then I simply correlated the instructor ratings with the course difficulty ratings.  I expected to find a negative correlation between instructor ratings and course difficulty ratings, but I was surprised at the great magnitude of the observed correlation.  Teacher ratings were significantly and negatively correlated with course difficulty ratings, r (n = 25) = -0.53, p = .005Students who reported that they had to work hard on their courses and that the course material was difficult gave their professors lower ratings than did those who reported that they did not have to do much work and the course was easy.

It is possible to remove the effect of course difficulty from the instructor evaluations.  For the data I had on hand I employed linear regression to predict instructor ratings (SOIS) from course difficulty ratings (DIFF).  I saved the residuals from that analysis (actual SOIS minus predicted SOIS).  These residuals represent what the SOIS ratings would be (relative to the mean rating) after taking out the effect of course difficulty.  These can be returned to original scale by adding the SOIS mean to the residuals, but rather than doing that I simply standardized the residuals to mean zero, standard deviation one.  I then compared the list of the top six instructors ranked on unadjusted SOIS ratings with the list of the top six instructors ranked on adjusted SOIS ratings.  There was some overlap -- two instructors were in both lists, but eight were were on one but not the other.

 

Flying birds

Back to Wuensch's Teaching Resources Page Back to Wuensch's Teaching Resources Page

Contact Information for the Webmaster
Contact Information for the Webmaster,
Dr. Karl L. Wuensch


This page most recently revised on 20. September 2006.