Discover more from Just Emil Kirkegaard Things
Racial diversity: does it matter? How do we know?
TL;DR don't do interactions without main effects
A lot of people on the right are concerned about the negative effects of ethnic and religious diversity. Sometimes, they produce plots like this one:
The plot shows racial homogeneity on the X axis for the 3100 US counties, along with the homicide rate. There's a fairly strong negative relationship of r = -.42. Similarly, if one does this but with social status, one gets this:
Social status is here an index (factor score) based on some 28 indicators, taken from my 2016 study. Again, it can be seen that less diversity is better. But is that really the causal story? It is easy to think about the negative effects of diversity, insofar as diversity is just codeword for not-white. In fact, if we look at the correlations, that's what they show:
So as a matter of fact, the two most closely related variables are non-Hispanic White share of the population and the non-diversity index, r = .81. I've used the standard Simpson's index here because it's the most intuitive and the others don't do any better. The Simpson's index is the chance of picking 2 persons from the same group when picking two people at random. Since this involves squaring proportions, the sum of these squared terms will be dominated by the largest groups, and in the USA, that's Whites. Hence, diversity is mostly just a proxy for not-White. OK, but mostly is not entirely. What do the regressions show? Here's the one for homicide rates, using the standardized betas:
(* = .01, ** = .005, *** = .001.)
So we see that alone, racial homogeneity has a decent standardized beta of -0.42, same as the correlation above. But when we control for the direct effects of the different races in model 2, that is, having more or fewer of some group, the homogeneity effect shrinks by 75% and changes sign! Black% is clearly the strongest predictor, not surprising considering their homicide rate is some 8-10 times that of Whites. In model 3, I added a control for intelligence, which is here proxied by the mandatory scholastic tests that each state gives in various grades, transformed to the same national scale (SEDA). This doesn't change the results much, though we see that intelligence also has a somewhat negative effect. In fact, it is somewhat surprising how little controlling for intelligence did to shrink the effect of Black%. We can further control for all sorts of things, here I have just included some geographical controls for illustration: longitude, latitude, temperature, and precipitation. Controlling for these actually increased the effect size of racial homogeneity, rather mysterious.
The same kind of story is seen when you repeat this for the social status index:
Racial homogeneity starts with the positive relationship with social status (beta = 0.28), but it reverses when controlling for the direct effects of race, shrinks further when controlling for intelligence, and can be similarly increased by controlling for geography.
I don't think anyone really believes that racial homogeneity causes higher homicide rates or lower social status as such. Rather, these data illustrate the dangers of confounding and not including main effects. Racial homogeneity is a kind of complex interaction variable as it is a function of several other variables (i.e., the racial composition variables). More likely the explanation for these kind of results is self-selection. Racially homogeneous counties in the USA are typically rural White counties. Here's the top 20:
19 of the 20 are >92% White, only Starr County in Texas is a Hispanic homogeneous county. These aren't exactly innovation hubs, but rather represent a kind of left-over population, that has been brain drained the last 50 years or so. West Virginia is the go-to example of this. The unstated assumption of doing these racial compositional and diversity models is that people of the same race in different units (here, counties) are the same (equal in whatever important traits). All of our knowledge of course tells us that this isn't the case. Consequently, these kinds of regressions will always produce some kind of bias in the results. If one is looking for small effects, possibly the entire result is due to some of these biases. Unfortunately, there isn't that much we can easily do about this. We have no alternative than to try these kinds of models, as well as their more causally informative cousins like the longitudinal fixed effects design when we can. Maybe some economist can find some strange natural experiment somewhere. In the meantime, the psychologists will continue to attempt to prove how great diversity is with their p-hacked lab studies.
I am not the first to call attention to this kind of issue. When I found these kind of results about 6 years ago, I found a few other studies who had noted this problem:
Kustov, A., & Pardelli, G. (2018). Ethnoracial homogeneity and public outcomes: the (non) effects of diversity. American Political Science Review, 112(4), 1096-1103.
How does ethnoracial demography relate to public goods provision? Many studies find support for the hypothesis that diversity is related to inefficient outcomes by comparing diverse and homogeneous communities. We distinguish between homogeneity of dominant and disadvantaged groups and argue that it is often impossible to identify the effects of diversity due to its collinearity with the share of disadvantaged groups. To disentangle the effects of these variables, we study new data from Brazilian municipalities. While it is possible to interpret the prima facie negative correlation between diversity and public goods as supportive of the prominent “deficit” hypothesis, a closer analysis reveals that, in fact, more homogeneous Afro-descendant communities have lower provision. While we cannot rule out that diversity is consequential in other contexts, our results cast doubt on the reliability of previous findings related to the benefits of local ethnoracial homogeneity for public outcomes.
According to recent research, ethnoracial diversity negatively affects trust and social capital. This article challenges the current conception and measurement of “diversity” and invites scholars to rethink “social capital” in complex societies. It reproduces the analysis of Putnam and shows that the association between diversity and self-reported trust is a compositional artifact attributable to residential sorting: nonwhites report lower trust and are overrepresented in heterogeneous communities. The association between diversity and trust is better explained by differences between communities and their residents in terms of race/ethnicity, residential stability, and economic conditions; these classic indicators of inequality, not diversity, strongly and consistently predict self-reported trust. Diversity indexes also obscure the distinction between in-group and out-group contact. For whites, heterogeneity means more out-group neighbors; for nonwhites, heterogeneity means more in-group neighbors. Therefore, separate analyses were conducted by ethnoracial groups. Only for whites does living among out-group members—not in diverse communities per se—negatively predict trust.
Because of these problems, I don't think we currently know much about the causal effects of diversity per se, in the strict sense of diversity. The effects of increasing the proportions of this or that group of people, however, are rather obvious and for the homicide rate, striking.