Predicting immigrant performance: Does inbreeding have incremental validity over IQ and Islam?
https://twitter.com/KirkegaardEmil/status/555217814488092673
So, she came up with:
https://twitter.com/hbdchick/status/555223247244242944
So I decided to try it out, since I'm taking a break from reading Lilienfeld which I had been doing that for 5 hours straight or so.
So the question is whether inbreeding measures have incremental validity over IQ and Islam, which I have previously used to examine immigrant performance in a number of studies.
So, to get the data into R, I OCR'd the PDF in Abbyy FineReader since this program allows for easy copying of table data by row or column. I only wanted column 1-2 and didn't want to deal with the hassle of importing it with spreadsheet problems (which need a consistent separator, e.g. comma or space). Then I merged it with the megadataset to create a new version, 2.0d.
Then I created a subset of the data with variables of interest, and renamed them (otherwise results would be unwieldy). Intercorrelations are: row.names Cousin% CoefInbreed IQ Islam S.in.DK 1 Cousin% 1.00 0.52 -0.59 0.78 -0.76 2 CoefInbreed 0.52 1.00 -0.28 0.40 -0.55 3 IQ -0.59 -0.28 1.00 -0.27 0.54 4 Islam 0.78 0.40 -0.27 1.00 -0.71 5 S.in.DK -0.76 -0.55 0.54 -0.71 1.00
Spearman' correlations, which are probably better due to the non-normal data: row.names Cousin% CoefInbreed IQ Islam S.in.DK 1 Cousin% 1.00 0.91 -0.63 0.67 -0.73 2 CoefInbreed 0.91 1.00 -0.55 0.61 -0.76 3 IQ -0.63 -0.55 1.00 -0.23 0.72 4 Islam 0.67 0.61 -0.23 1.00 -0.61 5 S.in.DK -0.73 -0.76 0.72 -0.61 1.00
The fairly high correlations of inbreeding measures with IQ and Islam mean that their contribution will likely be modest as incremental validity.
However, let's try modeling them. I create 7 models of interest and compile the primary measure of interest from them, R2 adjusted, into an object. Looks like this: row.names R2 adj. 1 S.in.DK ~ IQ+Islam 0.5472850 2 S.in.DK ~ IQ+Islam+CousinPercent 0.6701305 3 S.in.DK ~ IQ+Islam+CoefInbreed 0.7489312 4 S.in.DK ~ Islam+CousinPercent 0.6776841 5 S.in.DK ~ Islam+CoefInbreed 0.7438711 6 S.in.DK ~ IQ+CousinPercent 0.5486674 7 S.in.DK ~ IQ+CoefInbreed 0.4979552
So we see that either of them adds a fair amount of incremental validity to the base model (line 1 vs. 2-3). They are in fact better than IQ if one substitutes them in (1 vs. 4-5). They can also substitute for Islam, but only with about the same predictive power (1 vs 6-7).
Replication for Norway
Replication for science is important. Let's try Norwegian data. The Finnish and Dutch data are well-suited for this (too few immigrant groups, few outcome variables i.e. only crime)
Pearson intercorrelations: row.names CousinPercent CoefInbreed IQ Islam S.in.NO 1 CousinPercent 1.00 0.52 -0.59 0.78 -0.78 2 CoefInbreed 0.52 1.00 -0.28 0.40 -0.46 3 IQ -0.59 -0.28 1.00 -0.27 0.60 4 Islam 0.78 0.40 -0.27 1.00 -0.72 5 S.in.NO -0.78 -0.46 0.60 -0.72 1.00
Spearman: row.names CousinPercent CoefInbreed IQ Islam S.in.NO 1 CousinPercent 1.00 0.91 -0.63 0.67 -0.77 2 CoefInbreed 0.91 1.00 -0.55 0.61 -0.71 3 IQ -0.63 -0.55 1.00 -0.23 0.75 4 Islam 0.67 0.61 -0.23 1.00 -0.47 5 S.in.NO -0.77 -0.71 0.75 -0.47 1.00
These look fairly similar to Denmark.
And the regression results: row.names R2 adj. 1 S.in.NO ~ IQ+Islam 0.5899682 2 S.in.NO ~ IQ+Islam+CousinPercent 0.7053999 3 S.in.NO ~ IQ+Islam+CoefInbreed 0.7077162 4 S.in.NO ~ Islam+CousinPercent 0.6826272 5 S.in.NO ~ Islam+CoefInbreed 0.6222364 6 S.in.NO ~ IQ+CousinPercent 0.6080922 7 S.in.NO ~ IQ+CoefInbreed 0.5460777
Fairly similar too. If added, they have incremental validity (line 1 vs. 2-3). They perform better than IQ if substituted but not as much as in the Danish data (1 vs. 4-5). They can also substitute for Islam (1 vs. 6-7).
How to interpret?
Since inbreeding does not seem to have any direct influence on behavior that is reflected in the S factor, it is not so easy to interpret these findings. Inbreeding leads to various health problems and lower g in offspring, the latter which may have some effect. However, presumably, national IQs already reflect the lowered IQ from inbreeding, so there should be no additional effect there beyond national IQs. Perhaps inbreeding results in other psychological problems that are relevant.
Another idea is that inbreeding rates reflect non-g psychological traits that are relevant to adapting to life in Denmark. Perhaps it is a useful measure of clanishness, would be reflected in hostility towards integration in Danish society (such as getting an education, or lack of sympathy/antipathy towards ethnic Danes and resulting higher crime rates against them), which would be reflected in the S factor.
The lack of relatively well established causal routes for interpreting the finding makes me somewhat cautious about how to interpret this.
##Code for mergining cousin marriage+inbreeding data with megadataset
inbreed = read.table("clipboard", sep="\t",header=TRUE, row.names=1) #load data from clipboard
source("merger.R") #load mega functions
mega20d = read.mega("Megadataset_v2.0d.csv") #load latest megadataset
names = as.abbrev(rownames(inbreed)) #get abbreviated names
rownames(inbreed) = names #set them as rownames
#merge and save
mega20e = merge.datasets(mega20d,inbreed,1) #merge to create v. 2.0e
write.mega(mega20e,"Megadataset_v2.0e.csv") #save it
#select subset of interesting data
dk.data = subset(mega20e, selec=c("Weighted.mean.consanguineous.percentage.HobenEtAl2010",
"Weighted.mean.coefficient.of.inbreeding.HobenEtAl2010",
"LV2012estimatedIQ",
"IslamPewResearch2010",
"S.factor.in.Denmark.Kirkegaard2014"))
colnames(dk.data) = c("CousinPercent","CoefInbreed","IQ","Islam","S.in.DK") #shorter var names
rcorr = rcorr(as.matrix(dk.data)) #correlation object
View(round(rcorr$r,2)) #view correlations, round to 2
rcorr.S = rcorr(as.matrix(dk.data),type = "spearman") #spearman correlation object
View(round(rcorr.S$r,2)) #view correlations, round to 2
#Multiple regression
library(QuantPsyc) #for beta coef
results = as.data.frame(matrix(data = NA, nrow=0, ncol = 1)) #empty matrix for results
colnames(results) = "R2 adj."
models = c("S.in.DK ~ IQ+Islam", #base model,
"S.in.DK ~ IQ+Islam+CousinPercent", #1. inbreeding var
"S.in.DK ~ IQ+Islam+CoefInbreed", #2. inbreeding var
"S.in.DK ~ Islam+CousinPercent", #without IQ
"S.in.DK ~ Islam+CoefInbreed", #without IQ
"S.in.DK ~ IQ+CousinPercent", #without Islam
"S.in.DK ~ IQ+CoefInbreed") #without Islam
for (model in models){ #run all the models
fit.model = lm(model, dk.data) #fit model
sum.stats = summary(fit.model) #summary stats object
summary(fit.model) #summary stats
lm.beta(fit.model) #standardized betas
results[model,] = sum.stats$adj.r.squared #add result to results object
}
View(results) #view results
##Let's try Norway too
no.data = subset(mega20e, selec=c("Weighted.mean.consanguineous.percentage.HobenEtAl2010",
"Weighted.mean.coefficient.of.inbreeding.HobenEtAl2010",
"LV2012estimatedIQ",
"IslamPewResearch2010",
"S.factor.in.Norway.Kirkegaard2014"))
colnames(no.data) = c("CousinPercent","CoefInbreed","IQ","Islam","S.in.NO") #shorter var names
rcorr = rcorr(as.matrix(no.data)) #correlation object
View(round(rcorr$r,2)) #view correlations, round to 2
rcorr.S = rcorr(as.matrix(no.data),type = "spearman") #spearman correlation object
View(round(rcorr.S$r,2)) #view correlations, round to 2
results = as.data.frame(matrix(data = NA, nrow=0, ncol = 1)) #empty matrix for results
colnames(results) = "R2 adj."
models = c("S.in.NO ~ IQ+Islam", #base model,
"S.in.NO ~ IQ+Islam+CousinPercent", #1. inbreeding var
"S.in.NO ~ IQ+Islam+CoefInbreed", #2. inbreeding var
"S.in.NO ~ Islam+CousinPercent", #without IQ
"S.in.NO ~ Islam+CoefInbreed", #without IQ
"S.in.NO ~ IQ+CousinPercent", #without Islam
"S.in.NO ~ IQ+CoefInbreed") #without Islam
for (model in models){ #run all the models
fit.model = lm(model, no.data) #fit model
sum.stats = summary(fit.model) #summary stats object
summary(fit.model) #summary stats
lm.beta(fit.model) #standardized betas
results[model,] = sum.stats$adj.r.squared #add result to results object
}
View(results) #view results