Ethnic heterogeneity and tail effects
Chisala has his 3rd installment up: http://www.unz.com/article/closing-the-black-white-iq-gap-debate-part-3/
One idea I had while reading it was that tail effects interact with population ethnic/racial heterogeneity. To show this, I did a simulation experiment. Population 1 is a regular population with a mean of 0 and sd of 1. Population 2 is a composite population of three sub-populations: one with a mean of 0 (80%; "normals") one with mean of -1 (10%; "dullards") and one with a mean of 1 (10%; "brights"). Population 3 is a normal population but with a slightly increased sd so that it is equal to the sd of population 2.
Descriptive stats:
> describe(df, skew = F, ranges = T)
vars n mean sd median trimmed mad min max range se
pop1 1 1e+06 0 1.0 0 0 1.00 -4.88 4.65 9.53 0
pop2 2 1e+06 0 1.1 0 0 1.09 -5.43 5.37 10.80 0
pop3 3 1e+06 0 1.1 0 0 1.09 -5.30 5.13 10.44 0
We see that the sd is increased a bit in the composite population (2) as expected. We also see that the range is somewhat increased, even compared to population 3 which has the same sd.
How do the tails look like?
> sapply(df, percent_cutoff, cutoff = 1:4)
pop1 pop2 pop3
1 0.158830 0.179495 0.180856
2 0.022903 0.034342 0.034074
3 0.001314 0.003326 0.003126
4 0.000036 0.000160 0.000150
We are looking at the proportions of persons with scores above 1-4 (rows) by each population (cols). What do we see? Population 2 and 3 have clear advantages over population 1, but population 2 has a slight advantage over population 3 too.
Simulation 2
In the above, the composite population is made out of 3 populations. But what if it were instead made out of 5?
Descriptives:
> describe(df, skew = F)
vars n mean sd median trimmed mad min max range se
pop1 1 1e+06 0 1.00 0 0 1.00 -4.88 4.65 9.53 0
pop2 2 1e+06 0 1.27 0 0 1.21 -5.91 6.03 11.94 0
pop3 3 1e+06 0 1.27 0 0 1.26 -6.12 5.92 12.04 0
The sd is clearly increased. There is not much difference in the range, but the range is very susceptible to sampling error, which we have. How do the tails look like?
> sapply(df, percent_cutoff, cutoff = 1:4)
pop1 pop2 pop3
1 0.158830 0.205814 0.214353
2 0.022903 0.057077 0.056874
3 0.001314 0.011057 0.008872
4 0.000036 0.001246 0.000804
We see strong effects. At the +3 level, there are roughly 10x as many persons in the composite population as in the normal population. Population 3 also has more, but clearly fewer than the composite population.
We can conclude that one must take heterogeneity of populations into account when thinking about the tails.
R code
You can re-do the experiment yourself with this code, or try out some other numbers.
library(pacman)
p_load(reshape, kirkegaard, psych)
n = 1e6
# first simulation --------------------------------------------------------
set.seed(1)
{
pop1 = rnorm(n)
pop2 = c(rnorm(n*.8), rnorm(n*.1, 1), rnorm(n*.1, -1))
pop3 = rnorm(n, sd = sd(pop2))
}
#df
df = data.frame(pop1, pop2, pop3)
#stats
describe(df, skew = F)
sapply(df, percent_cutoff, cutoff = 1:4)
# second simulation -------------------------------------------------------
set.seed(1)
{
pop1 = rnorm(n)
pop2 = c(rnorm(n*.70), rnorm(n*.10, 1), rnorm(n*.10, -1), rnorm(n*.05, 2), rnorm(n*.05, -2))
pop3 = rnorm(n, sd = sd(pop2))
}
#df
df = data.frame(pop1, pop2, pop3)
#stats
describe(df, skew = F)
sapply(df, percent_cutoff, cutoff = 1:4)