306. Gender Bias and Timing of SETs

 A number of studies demonstrate gender bias in course evaluations. In this episode Whitney Buser, Jill Hayter, and Cassondra Batz-Barbarich join us to discuss their research that looks at the timing of when these gender differences emerge and theories for why they exist.

Whitney is the Associate Director of Academic Programs in the School of Economics at Georgia Tech. Jill is an Associate Professor of Economics in the College of Business and Technology at East Tennessee State University. Cassondra is an Assistant Professor of Business at Lake Forest College. Whitney, Jill, and Cassondra are the authors of an article entitled “Evaluation of Women in Economics: Evidence of Gender Bias Following Behavioral Role Violations.”

Show Notes


John: A number of studies demonstrate gender bias in course evaluations. In this episode we discuss research that looks at the timing of when these gender differences emerge and theories for why they exist.


John: Thanks for joining us for Tea for Teaching, an informal discussion of innovative and effective practices in teaching and learning.

Rebecca: This podcast series is hosted by John Kane, an economist…

John: …and Rebecca Mushtare, a graphic designer…

Rebecca: …and features guests doing important research and advocacy work to make higher education more inclusive and supportive of all learners.


John: Our guests today are Whitney Buser, Jill Hayter, and Cassondra Batz-Barbarich. Whitney is the Associate Director of Academic Programs in the School of Economics at Georgia Tech. Jill is an Associate Professor of Economics in the College of Business and Technology at East Tennessee State University. Cassondra is an Assistant Professor of Business at Lake Forest College. Whitney, Jill, and Cassondra are the authors of an article entitled “Evaluation of Women in Economics: Evidence of Gender Bias Following Behavioral Role Violations.” Welcome Whitney, Jill, and Cassandra,

Whitney: Thank you for having us.

Cassandra: Thank you so much.

Rebecca: Today’s teas are:… Whitney, are you drinking tea?

Whitney: I am. I have some jasmine tea.

Rebecca: Always a good choice. Jill. How about you?

Jill: Harney and Sons Hot Cinnamon Spice.

Rebecca: Oh, that’s such a good choice. I love that one. It’s a family favorite at my house. How about you, Cassandra?

Cassandra: Yesterday, we made a sun tea on the porch. So it’s sweet peach tea.

Rebecca: This is a good variety. How about you, John?

John: And I have ginger peach black tea from the Republic of Tea.

Rebecca: So we’re combining choices here [LAUGHTER]. And I have Awake tea, despite the fact that it is early afternoon here.

Jill: I also had three cups of coffee this morning.

Rebecca: It’s one of the most popular kinds of tea, Jill.

John: We’ve invited you here today to discuss your research on gender bias instudent evaluation of instructors. Could you tell us how the study came about?

Whitney: Jill and I have been working on this for about six years, believe it or not. It’s been a long process for us. And actually at the very beginning we had a different third working with us. And the original three of us, we met at the conference, and we had just attended a session that talked about teaching evaluations. And afterwards, we just naturally began talking about this, because we all had these really, really strong feelings about teaching evaluations. All three of us at the time were young, young in our careers, young age wise. We were female PhD economists. And we were all earning tenure, or I think Jill had just earned tenure. But we’re all in this similar experience of having what we felt like was a very positive class climate, and a lot of camaraderie between ourselves and the students until the grades were returned for the first time. And then we could feel a definite shift and it was upsetting to all of us. We all got into this because we love teaching and we want to do a good job in that. It was just something that we were picking up on. So that was our anecdotal experience, Jill had a little data on it herself, because she would do mid-semester evaluations herself, just to gauge the class climate and see what students were needing. And I had an experience where in my first position, they did a surprise midterm evaluation, just to kind of see how the new professor was doing, that I didn’t know about. And I got glowing reviews from the students, everything was very, very positive, wonderful and six weeks later, same students but grades returned, evaluations looked a little different. And the comments were a bit different. So we had a little data to backup this idea too, and one thing if the people listening today haven’t read the literature, there’s an extensive literature on course evaluations. And it consistently finds gender bias in those. But the thing about that literature is it only looks at evaluations, which are typically done on the very last day of class, maybe even after that, maybe a couple of days before, but at the end of the semester. And we really haven’t seen anyone look into how these opinions of students evolve over the semester, or how students feel at the beginning or the middle of the semester. So that’s what we wanted to do with that. And in my opinion, and this is just me speaking here, Jill can have her own other motivations, or our other co-author that has worked with us before could feel differently. But for me, it was really important to acknowledge that society has come a long way in the past several years with gender bias. And I don’t think that modern students are shocked by female faculty any longer, I don’t think they have an explicit distaste for female faculty. Anecdotally, I feel that my students are actually happy when they meet me. And they have expectations of me to be warm, comforting, approachable. But I do think that when you expect someone to be more comforting and approachable, and they give you a grade back, that’s not always an “A” in a difficult quantitative subject like economics, you can get a bit of a Grinch Who Stole Christmas effect. I thought it was going to be one way and now my expectations are taken down. We all know no one likes that dopamine depletion of having expectations not met. So, to me, if we’re going to talk about gender bias, we really have to talk about it in this nuanced way, so that it doesn’t get automatically dismissed by people who don’t see an explicit bias and then say, “Oh, hey, there’s nothing here.” And then the last thing that I think is really important here for the motivation for the paper is that we have this expectation that bias would grow over the semester. So if bias grows over the semester, that means the earlier in the semester you evaluate, the smaller the bias will be. And one thing that the literature is missing is a very concrete objective way to deal with bias. What we were hoping to find was: move the evaluations up in the semester a bit, and you minimize or eliminate bias and that’s a concrete objective. Towards the end today, we’ll talk about what we actually found and whether or not we knew that. But that was one of the motivations.

Jill: So that’s how the original paper found in terms of motivation, but then Cassandra, she is a PhD in Psychology until she had read and she was doing work in the area. And she had reached out to Whitney and I. She had read our paper, she had read the results of our paper. And so then a second paper with Cassandra takes a more psychology approach in terms of a lot of what Whitney is talking about and Cassandra is going to talk about it later, with respect to the role-incongruity theory, social role theory, and she’s going to talk more about that later. And Whitneys described the motivation of that first paper, the second paper takes a very different perspective and looking at it from a more psych perspective. Cassandra, you might want to chime in?

Cassandra: Absolutely. I think you summarized it well, I joined the paper, as Whitney and Jill were trying to find a home for it. And we thought that our interests, though coming from very different backgrounds ,would blend nicely for this particular topic, as there’s a lot of scholarship in psychology that looks at understanding reasons behind this bias. And so I was brought in to really help kind of think about how do we frame that in a way that might appeal to even a broader range of audiences.

Rebecca: At the beginning of the paper, and Whitney, you’ve kind of pointed to this today about being a young faculty member, you also noted in the paper that women are underrepresented among economics faculty, especially at the level of full professors. Can you tell us a little bit about the extent of this under-representation?

Jill: Women have earned more than half the doctoral degrees for over a decade. But particularly among tenure-track faculty are underrepresented. In the paper we cite 36% of full professors are females. In economics, that’s a smaller percentage, 17 and a half percent of full professors are females, in the area of economics, although 35% of PhDs in econs represent females. It’s a smaller percentage of female faculty receiving full professor rank in economics. That’s what we mean by that under representation. In terms of economics, specifically, it’s oftentimes left out of the STEM fields, and depending on which university or college that you’re out at, economics can sometimes could be found in the social sciences and in the arts and sciences, or it can be found in the business school. So at my institution, Whitney’s institution, I believe, and Cassandra’s I think we’re all represented in the business school. But sometimes, you know, economics wanted to put in there with the social science field, it’s not thought of as being this more quantitative, heavy subject, and it oftentimes is, it is by nature of it. And so females in those more math heavy classes, like the STEM classes. I think my students when I started off, and I think Whitney was getting at this, with us being more junior faculty members. I can considered by students peer, instead of the professor in the course. And that made it tough, because to Whitney’s point about that returning grade feedback and the perception that students had of me a day one versus midway through the course, I was now coming across as someone that was handing back maybe less than 100% or “A” grade. So in my business school, my principles of economics courses are required. They might not even want to be in there, but they have to be in there to get a business degree. Earlier on, that was a challenge I faced, I’m 13 years into my career. I’m going up for full professor this summer. But starting off was really a challenge. And I remember having female mentors in my graduate program. They tried to prepare me for this, they tried to say it’s going to be challenging early on, you’re going to have to go against some of these perceptions, alot of the perceptions that we measure in this paper..

John: To what extent is the underrepresentation of women faculty due to a cohort effect where women have become a larger share of PhD economists in the last few decades, but that was less true 20 or 30 years ago and how much of it might be due to the impact of gender bias on evaluations on career pathways for women?

Jill: Really what this paper looks at, the standard evaluations of teaching and the bias or potential for bias, that exists there. So I’ll just speak to that and that where I currently am, evaluations of teaching are weighted heavily for retention of faculty, promotion of faculty, tenure and promotion decisions. And then when we’re hiring new faculty, looking at any previous course evaluations and experience with teaching. At every level in academia, these are used as some gauge for teaching effectiveness. I think one of the questions that we’re looking at and accrediting bodies are looking at is whether or not this is the measure that should be used. And looking at different measures that might be options for measuring teaching effectiveness, we know that they’re flawed, that our study is showing that they’re flawed, but also previous literature has suggested that they’re flawed as well. And so the fact that for most schools, this is the single measure that’s being captured… and I know that it’s different depending on again, at my institution, some departments don’t give them a whole lot of weight in tenure and promotion decisions. But certainly, my experience in my College of Business and Technology that these are weighted heavily. And so in thinking about a junior faculty member starting off, when Whitney and I met at the conference, if my evaluations were lower, I’m putting a lot of time into my teaching and improving and bringing up those scores. My male colleagues, in discussion just with them, didn’t have the same experience that I was having with respect to these SETs. And so we think about allocation of time and resources as a tenure track junior faculty member, I’m putting more in what I would consider just catching up, getting those SET scores higher, so that it’s reflected in my tenure and promotion packet. And that’s less time that I’m allocating toward research or other things. That’s my view on it. I think Whitney has a couple other thoughts on that.

Whitney: One of the things we tried to make clear in the paper is that the literature is very clear that evaluations do have a gender bias. And if these evaluations are being used, and they are, in hiring decisions, annual evaluations, promotion, tenure evaluations, and merit pay raise decisions, then they’re being used at every single level of advancement. It’s not one small piece. It’s a piece that’s used throughout and very integrated late in the process.

Rebecca: You mentioned at the top of our interview that the second paper shifts more towards psychology, and specifically describes ways in which both social role theory and role-congruity theory may explain the bias against female faculty in student evaluations. Can you briefly summarize these arguments for our listeners?

Cassandra: So social role theory was a theory that has been put forth for decades by Alice Eagly, a very prominent scholar in the social psychology world, as well as her colleagues. And this has been used as a framework to really understand the complexities and origins of gender gaps in our workplace in particular, whether that be inequities and experiences, the expectations that are different for women, and of course, the outcomes such as promotion at work. Essentially, social role theory suggests that the reason we see these gender inequities today in society or that they originated from men and women being distributed into social roles based on physical sex differences, so that women biologically were able to have children, men, on average, were physically stronger, which those differences 1000s of years ago, had an evolutionary benefit to a well functioning society, people were supporting in the ways in which they were best equipped to do so. And the assignment of men and women into these roles led get them to adapt role-specific qualities and skills. So women who were bearing children were friendly, helpful, sensitive, concerned with others, kind, caring. We refer to these now as more communal qualities, and men and the provider, the protector, role led them to have attributes such as ambition, being assertive, authoritative, dominant. These are qualities that now we label as agentic. So while technology of course has since caught up and made these biologically driven role assignments unnecessary, society continues to see a division of labor along these lines in the modern world and society at large. And society at large still holds the belief that women do possess these traits, and should possess these traits, these more communal qualities, and men do and should possess more of these agentic. Relatedly, role-congruity theory helps us understand the consequences when men and women fail to fulfill these expectations. And we know the failure to fulfill these expectations are more consequential for women, this experience of bias driven from the failure to behave in communal ways. In other words, violating these cultural expectations can be seen in all areas of society, but particularly in traditionally male-dominated positions, like college professors, or in male-dominated fields like economics [LAUGHTER]. And so women that are in these roles are already going to experience some degree of backlash for being in gender-incongruent positions. But that is especially true if they are also going to behave in traditionally more agentic ways, being more assertive, demonstrating their power, which we argued was what was occurring when you give critical feedback back to students.

John: To approach this, you gave evaluations to students at two different points of the semester. Could you tell us a bit more about the study design, how large the sample was and how many faculty and institutions participated in the study?

Whitney: Sure, we had a really rich data set for this study. That’s one of the reasons we were able to get two different papers out of it, and maybe even some future research, because we took all of this data, and we collected it in person on paper and entered it, which was an arduous process. As I said, we had been working on this project for about six years, about a year and a half of that was just data collection. And we have a lot of people to thank that did that for us for no author credit on this paper, so we had males and females across the United States gathering that data for us, that we’re really appreciative to have. So in the end, we wound up with about 1200 students in total, we weren’t quite 50/50, we were 60/40, favoring men, which is typical for economics classrooms, even though it is required in a lot of majors (that’s where you’re getting a lot of the women taking it). And like you said, John, we surveyed them twice. We surveyed them on the second day of class, we wanted as close to a first impression as possible without having a major sample issue with drop/ad. And then we surveyed them the day after they got their first midterm grade back. So we got the first impression, and then we got the way that they felt after they had had their first grade returned. We did this at five different colleges and universities, we had three male professors contributing data and four female professors contributing data. One of the big questions that people have asked us over the time is “Well, how does race play into this?” And that’s something that’s beyond the scope of our research, I will say that we only had one underrepresented minority in our sample, again, typical of economics professors, it was one of our male instructors. So, we would expect a downward bias from race and maybe an upward bias from gender, or getting those two, at least watching one another out in the paper. And when we asked these students about how they felt after their grades were returned. This was about four weeks into the semester, so still pretty early in the semester. What we did was we really wanted to ask about the specific qualities that had been hypothesized in the literature as drivers of bias or drivers of differences. So we just asked students to rate their instructor on a bunch of different qualities. Cassie really helped us out here because she came in and she says, “Well, you know, we can categorize these qualities into communal qualities and agentic qualities and neutral qualities…” which was really the way to approach it because of course, we get different things in communal versus just qualities. So we asked our students things like: “How knowledgeable do you find your professor? How challenging? Do you find them to be approachable? Do you find them to be caring? Are they interesting?” And then we asked a couple of very general questions: “Would you recommend the course?” All of this set us up to have a really nice dataset where we could look between genders and across time as well.

Rebecca: So I think everyone’s probably dying to know exactly what you found. [LAUGHTER]

Jill: I’m just going to provide an overview of the results because we do a number of different specifications and use different econometric methods in the findings. And so you can get all of those results there in detail. But in general, on the second day of class, we find that women are receiving lower ratings across the five agentic and gender-neutral instructor characteristics that we measured. They were rated higher on that second day of class on those more communal characteristics. And not all of those differences were statistically significant. Immediately after the first exam grade was returned to students, women were receiving lower ratings for all seven measured characteristics. Each difference was significant except for those caring and approachable, more communal characteristics. And then men were now having higher ratings in all the different aspects relative to time, or the second day of class. Over time, what we see was that men’s evaluations were getting higher on all characteristics from the second day of class to the period after the first exam was returned. And then in contrast, women’s evaluations were not trending upward. So we had a couple that were staying the same, but overall, they were going down. So those are just some overview findings. Again, those more specific results, by specification, can be found in the paper.

John: We will include a link to both papers in the show notes too, so people can go back and review them. To summarize, what you found is there was relatively weak evidence of significant gender bias on the second day of class, but that gap increased fairly dramatically after the first graded exam. So what do you attribute that change to, was it because of the feedback students were getting from grades as Whitney had mentioned before?

Whitney: We were attributing, and Cassie can talk about this with more authority on the theoretical point, but we’re attributing that to backlash theory, this idea that if I expect one thing, and I don’t get it, there’s this need to back off so that things go in congruence.

Cassandra: Exactly, Whitney is spot on there. What we thought this was evidence of was women behaving in gender incongruent ways, women are supposed to be warm and caring and friendly. And when you get a perhaps grade that maybe wasn’t an “A,” that feels harsh and critical, and a woman is asserting their power and dominance in the classroom, which again, they already are in a male dominated field profession. And those two things together combined can result in this backlash.

Rebecca: So if we take these findings, and think institutionally, what are some things that institutions might want to think about moving forward?

Whitney: That’s a good question. If you remember, from the very beginning, we were saying, we’re really hoping to find this nice objective concrete solution, we anticipated finding it through timing. And that’s what I would really like to do with future research is to be able to find something concrete and objective to treat this with. We weren’t able to do that because we found bias from the beginning. And we found that it came so quickly in the semester that it’s not something that we can just move back evaluations to midterm or something like that. Since we can’t do that, we’ve talked about other ways for institutions to take this. And one takeaway really is just an awareness that these gender biases exist and that these evaluations are flawed. This is really well established in the literature, but not necessarily in the general sphere of knowledge. When we published this paper, Georgia Tech did a little feature in their daily digest, and I had two female engineering faculty email me and say, “I knew this in my gut for years, but nobody’s ever quantified it.” That to me, is just evidence that it’s not in the general sphere of knowledge, even though the literature defines it well. Some of the impact of the concrete solutions that we have seen is we’re seeing a lot of schools and accreditors, like AACSB, they’re starting to require multiple indicators of teaching effectiveness and evaluation. So evaluations and peer reviews, or maybe something else to see the observation, something to that effect to where we have more of a global and inclusive way to look at someone’s teaching effectiveness. So this is a great takeaway, hopefully that will reduce the weight of the impact of evaluation just by having other factors in there. And just one final point that I want to make. And this is just a really big sticking point to me for the paper is that all of us are researchers, we all deal with statistics and statistical significance, and robust research methods. And then when those of us in Chair and Dean roles go to look at evaluations, all the sudden, all that training completely goes out the window, and we look at the difference between a 4.2 and a 4.4. And I know those differences sound really small, they are that small. And we say, “Oh, well, this person does better than this person, this person deserves to be hired over this person.” Never in our research, or in a formal presentation, would we ever compare two means that small without significance testing, number one, and without making sure they’re actually comparable, and say, “Oh, there’s a difference.” It’s just something that I think we need to recognize, we would not recognize this as good research or good methodology in any other area of our work. It’s just something that we should keep in mind as we move forward with this.

John: Now, you mentioned the use of peer evaluations as another way of providing, perhaps, more balance, but might they be subject to the same type of bias?

Whitney: Yeah, all the things that we would see for student evaluations, I can imagine how you would see with peer evaluations as well.

Jill: But there are creative ways to do peer evaluations that I think here at ETSU, we have a Center for Teaching Excellence. And I’m confident Georgia Tech, and Lake Forest has their own version of that. And so there are creative ways. And again, not that SETs are necessarily bad, but knowing what we know about the flaws in them, that, coupled with an additional measure or two, can be a lot more insightful, I think, to the teaching effectiveness, like true teaching effectiveness of instructors.

John: And one thing I’m wondering is if the measured effect might be larger in economics, because at least at many institutions, grades and economics and STEM classes are often lower, which might magnify the effect of this difference. It would be interesting if there was to be a study that also included some classes, maybe in humanities, to see if perhaps there’s less of an effect because of that role-incongruity issue there. It may not appear to be as severe in disciplines where grades across the board tend to be higher.

Whitney: I think you’re right about that, most people when they take economics, it’s a required class and certainly the grades are a big factor, then the two things that showed the most significance outside of our key variable of interest was interest in economics, and expected grade. Those were the things that across the board… now we still found gender bias controlling for those things, but it mattered.

Rebecca: So we talked a little bit about things that institutions might want to start thinking about: institutional policy and things that might shift how we use teaching evaluations. Are there any other strategies that institutions or instructors can use, or adopt, to try to reduce this bias in the short term?

Cassandra: That’s really the million dollar question. Because this type of bias exists in a lot of different domains, whether we’re talking managers and their subordinates, teachers and their students. One thing that’s often suggested or recommended is simply making people aware that this bias exists, and providing training on how to better approach evaluations, whether that’s how to use a rating scale and ensuring that you aren’t engaging in a halo effect, for example. Another strategy is requiring that people justify their ratings that are provided with qualitative comments… that if you’re just asked to fill out on a scale, on how competent is this person? Well, bias may creep in more if you aren’t asking for a justification of why that particular rating was given for competence. A last recommendation that I’ll share here is making these evaluations more public. So if there are a couple of people, say peers, that are evaluating myself or Whitney or Jill in the classroom, well, they need to come together, share and publicly disseminate their evaluations that they had given to us. This social accountability can help to mitigate bias and for people to ensure that the ratings that you’re giving are, in fact justified.

John: So we’ve got a long ways to go with this. It’s a problem that’s been recognized for quite a while with a lot of studies. But there hasn’t been that much done to address that. And those are some good suggestions that institutions may want to try. We always end with the question: “What’s next?”

Cassandra: [LAUGHTER] That’s a good question. Of course, I think that the three of us collectively would say we do hope that administration and decision makers start asking questions about their use of student evaluations of teaching and how they might seek to mitigate this bias, based on the recommendations Whitney had already shared. But we also hope that women faculty perhaps feel more empowered to advocate for themselves when it comes time for promotion and tenure decisions to be made. My Institution, a part of the promotion process, is writing letters, and going through interviews. So speaking to this, bringing an awareness to the people who are making the decisions that this exists, and that it is not just an opinion, that there is empirical evidence of its existence. But we are really interested in exploring more fully how providing feedback, particularly critical feedback, like in our study, where the professors are giving back grades might impact the perceptions of men and women in other contexts as well. So is this a phenomenon we would see, for example, between a manager and their team? Do people respond differently to critical feedback from a manager because of their gender? And how much are these differences, perhaps, driven by perceptions of how communal or agentic they are in their delivery of that feedback? So in other words, are we seeing the same pattern in other contexts? Ultimately, we hope that by better understanding how perceptions of communion and agency impact interactions that women have at work, particularly women in male-dominated or gender-atypical roles, this greater understanding will allow us to also discover ways to alleviate some of that backlash through more targeted interventions and training and perhaps better timing. Because at a minimum, it’s important to highlight the various ways gender bias continues to persist in our society. Because without that awareness, nothing can be changed.

John: Whitney, Jill?

Jill: I think that was great. [LAUGHTER]

Whitney: Yeah, I think, Cassie, you did a great job. And Cassie certainly helped us out with bringing formal language and theory to things that we felt as intuitive and we felt in our gut as important. We don’t have a lot of language for that in the economic space. And so blending these two disciplines together has been very helpful for looking at the situation.

Rebecca: Well, thank you all for joining us. And the research that you’re doing is really important and impactful. So we hope our listeners will use it.

Whitney: Thank you so much.

Cassandra: Thank you.

Jill: Thank you so much.


John: If you’ve enjoyed this podcast, please subscribe and leave a review on iTunes or your favorite podcast service. To continue the conversation, join us on our Tea for Teaching Facebook page.

Rebecca: You can find show notes, transcripts and other materials on teaforteaching.com. Music by Michael Gary Brewer.

Ganesh: Editing assistance by Ganesh.