Crime by immigrant group by proportion of immigrants in the neighborhood in the Netherlands
Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result: Immi% mean sd median trimmed mad min max range skew kurtosis X0.5. 1.137 0.182 1.026 1.113 0.039 1 1.588 0.588 1.073 -0.148 X5.15. 1.284 0.292 1.162 1.258 0.24 1 1.938 0.938 0.809 -0.641 X15.50. 1.509 0.65 1.382 1.381 0.465 1 3.812 2.812 2.203 4.758 X.50. 1.769 1.154 1.435 1.526 0.471 1 5.812 4.812 2.36 4.937
In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).
The samples are too small for one to profitably dig more into it, I think.
R code & data
library(pacman)
p_load(plyr, magrittr, readODS, kirkegaard, psych)
#load data from file
d_orig = read.ods("Z:/code/R/dutch_crime_area.ods")[[1]]
d_orig[d_orig=="" | d_orig=="0"] = NA
#headers
colnames(d_orig) = d_orig[1, ]
d_orig = d_orig[-1, ]
#remove cases with missing
d = na.omit(d_orig)
#remove names
origins = d$Origin
d$Origin = NULL
#remove unknown + total
d$Unknown = NULL
d$Total = NULL
#to numeric
d = lapply(d, as.numeric) %>% as.data.frame
#convert to standardized rates
d_std = adply(d, 1, function(x) {
x_min = min(x)
x_ret = x/x_min
})
describe(d_std) %>% write_clipboard