This is the 1601st post on this blog since 2007 (I regret not seeing the counter was at 1599 yesterday!). That's kinda crazy to think about. In about 2.5 years time, the blog will reach its 20th anniversary, which I guess means I am getting old.
That aside, let's talk about teachers and their biases. Now, we know from social psychology that different political groups like different people. Teachers in Western countries are usually women and left-of-center politically. According to this tool, in the USA, teachers are 80% Democrat versus 20% Republican. According to other statistics, then, we can guess that teachers tend to be woke, that is, positively disposed towards non-Whites, females, sexual minorities, immigrants and other left-wing groups. Teachers nevertheless have to teach everybody's children, so we might guess they perform slightly worse for pupils who belong to demographics they don't like. More importantly, they might furthermore bias their grade evaluations slightly against such groups, given that grade evaluations necessarily have some subjective aspect to them. Based on this, we would expect grades to be slightly biased against boys, ethnic majorities and so on. A new German study tested this idea. Well, they actually tried to find bias against immigrants, but found the opposite.
Bredtmann, J., Otten, S., & Vonnahme, C. Discrimination in Grading? Evidence on Teachers’ Evaluation Bias Towards Minority Students.
We analyze whether teachers discriminate against ethnic minority students in terms of grading. Using comprehensive data on students in German primary and secondary schools, we compare students’ scores in standardized, anonymously graded achievement tests with non-anonymous teacher ratings within a difference-in-difference (DiD) framework. We find that, on average, minority students receive lower grades than majority students in both German and Math. However, these differences are not due to discrimination in grading against minority students. Instead, performance gaps between minority and majority students are significantly reduced when being graded by the teacher compared to being assessed through the standardized test. We provide supporting evidence that this finding cannot be explained solely by the fact that minority students face higher barriers on the standardized test due to language difficulties. Rather, our results suggest that teachers have a positive evaluation bias towards ethnic minority students.
The important thing is to have an objective measure, test scores graded automatically or anonymously, to which one can compare the teachers' grades for the same subjects. Using German data for about 90,000 students, they had ample sample size to try power hungry econometric models, in this case DID. The gaps look like this:
They don't seem to report the group gaps in test scores without controls, but the grade gaps are about 0.30 standard deviation (SD). The simplest analysis method is just regressing grades on test scores + controls:
With only a control for year, the grade gaps are about 0.30 SD (columns 1 & 4). However, controlling for test scores and class rooms, the effect unexpectedly turns slightly positive (columns 3 & 6; p values are fine). This is despite the fact that given the measurement error in the test scores, this value is biased towards negative values. Instead of trying to adjust for measurement error, they employed 'DID'. If we trust the method, these are the results:
I don't really understand why this DID should work, maybe some econometric person can explain in the comments. It looks like it is just comparing the sizes of the gaps by score type -- grade vs. test -- and sees if these are different. They can be different if grades are a worse measure of academic ability than test scores, which I think they are. If so, then a spurious interaction will be found and its size will correspond to the difference in the academic ability loading of the two measures (so I guess the test score gap is about 0.60 SD, this is based on total sample SD, so the Cohen's d is a bit larger). Here's the relevant method section:
You see the outcome variable is the dataset stacked, so that half the values are from tests and half from grades. A dummy variable is added to indicate this, and it's interaction with minority status is estimated. Thus, we just get the difference between the gap sizes for the same students on the two measures of academic ability. Please correct me if I am wrong. If I am right, and grades are worse measures of academic ability, then this estimate of teacher bias is too large. The true value will be somewhat larger than the values in the first table, since these have to be adjusted for measurement error. The authors could have just used SIMEX I think.
Interesting is how the authors report their results:
Most importantly, however, the coefficients of the interaction term indicate that minority/majority gaps in test performance are reduced by 0.26 (German) and 0.20 (Math) SD when students are evaluated by their teachers. This represents an improvement of about 0.2 grade points or 5–6% for minority students. The finding of higher teacher ratings of ethnic minority students compared to their test scores contradicts much of the previous literature, which tends to find negative biases against ethnic minority students (e.g., Botelho et al. 2015; De Benedetto and De Paola 2023; Sahlströhm and Silliman 2024). However, our results are consistent with Zhu (2024), who shows that – after correcting for measurement error in standardized test scores – teachers evaluate black students as higher achieving than white students with the same standardized test achievement.
Since Germany uses academic streaming (slotting of good and bad students into different school types), this kind of teacher bias would unfairly harm 1000's of German students every year by placing them undeservedly in the lower school type. Granted, these will be the marginal cases, so it probably doesn't matter that much, but probably a little since school type determines access to higher education later on. They go further, and when one author was interviewed in the German media Spiegel about this she said:
SPIEGEL: Children with a migration background have a harder time in the German school system than their classmates whose families are not immigrants. Together with researchers from the University of Duisburg-Essen, you have examined a large number of assessments by teachers. Do children with a migration background receive worse grades per se?
Bredtmann: No, our study shows that there is no systematic discrimination in German schools when it comes to grading schoolchildren with a migration background. On the contrary, we found a result that was initially surprising: Children with a migration background tend to receive better grades from teachers than their performance in anonymously assessed standardized tests would suggest.
SPIEGEL: How do you explain that?
Bredtmann: We suspect that teachers are unconsciously trying to compensate for social disadvantages by giving more positive grades - both for children with a migration background and for children from so-called low-education households, for whom we find a similar result. What we were also able to show: When teachers teach in classes with an above-average number of low-performing or socially disadvantaged schoolchildren, they show a particularly pronounced tendency to give children with a migrant background a better grade.
Aside from the American study by Zhu (2024), Cremieux summarized another clever study:
In New York, seniors take the Regents Examinations in order to graduate. When teachers became the graders instead of having a centralized state-level authority doing the grading, they very clearly cheated so more students would pass. To obtain a low-level “local diploma,” students had to score 55 on the exams; to obtain the prestigious “Regents Diploma,” they had to score 65. Scores in years where teachers weren’t doing the grading were nearly normally distributed; when the teachers took the wheel, there were an excess of scores at either cutoff:
When the scores are split by race, we see this:
This cheating behavior led to the tests being biased in favor of Blacks and Hispanics over Whites and Asians, but the reason doesn’t necessarily have anything to do with racial discrimination by teachers in favor of Blacks and Hispanics.5 Instead, it has to do with Blacks and Hispanics scoring worse, and potentially with the fact that they tend to go to schools with more low-scorers, which might offer further incentives for teachers to round up. As a result, their mean scores clearly became incomparable to those of Whites and Asians, but the bias was because so many more of them were in need of teachers’ help cheating the grading system so they could pass.
The best way to avoid this racism and other biases of the teachers is to use standardized, objectively scored tests. In fact, meritocracy demands this. It is also cheaper since scoring tests is easily automated. Maybe, in fact, we should entirely get rid of teacher-given school grades and only use test scores. Why not?
I’d bet my house the same result holds for sex.
Boys will do relatively better on tests than classroom-based assessments as they spend less time sucking up to teacher.
In an ideal world no leftwing teachers would be allowed and highschool doesn’t exist.