Who is gaining Danish citizenship?
Also France
§44
(1) No alien shall be naturalized except by
statute.
(2) The extent of the right of aliens to become
owners of real property shall be laid down by
statute
So there are two ways to be a Danish citizen. You are either granted it by virtue of birth to a Danish citizen, or parliament specifically makes a law to grant it to you. Since all laws are public, everybody who has been ‘naturalized’ this way has their name written in a law somewhere. This fact means that we can track all such conversions for the purpose of seeing who is ‘becoming Danish’ in this legal sense. Using Claude, I scraped all these laws and built a database. It turns out that the information available about the people also changes by year:
For the pre-2000 laws, the text stated their birth country. Here’s a snippet from a 1995 law:
1) Samir Toufik Abbas, København, født i Libanon.
2) Abdel Haq Abdel Razzag Abbas-Al-Awsi, Frederiksberg, født i Irak.
3) Farah-Legha Abdalkhani, Odense, født i Iran.
“født i” = born in. For the 2000-2020 laws, no information is given, 2010 example:
1) Manuelito Sy Abalos, Albertslund.
2) Gulalay Abbassi, Århus.
3) Farrokh Abdolkarimi Koohbanani, Aalborg.
Only their current residence commune is given. From 2021 to present, broader regions are given:
§ 2. Indfødsret meddeles til følgende personer, som er statsborgere i andre vestlige lande end lande i Norden:
2) Maiken Ellen Baird, bosiddende i Amerikas Forenede Stater.
3) Bente Beckmann, bosiddende i Forbundsrepublikken Tyskland.
4) Ian-Alexandru Aakmann Businschi, bosiddende i Silkeborg Kommune.
But these concern now citizenship not birth country. As such, we have no data aside from names 2000-2020 and citizenship for 2021-present. In total, there are ~31k people with complete information which we can train on. There are laws before 2000 but they follow a different database format, are harder to get, and won’t have many people.
Before going further, it is reasonable to look at the counts of names. First names:
There’s a variety of versions of Mohammad, and it is in fact the most common naturalized name. Merging the spelling variants:
The last names are less conspicuously foreign:
There are a lot of Danish ones, which presumably reflect foreign spouses taking a Danish name and eventually acquiring citizenship. The most common is the Vietnamese Nguyen.
Using the names of people, we can attempt to infer their ethnicity for the years we have no birth country data. There are a variety of such name-to-ethnicity prediction models available. A commonly used one is ethnicolr. It produces these results:
There’s a surprising number of Nordic and British names. However, checking these we see that they are often wrong:
Nicolette King - ethnicolr: British, actual: born in Denmark (likely Anglo parent)
Seyd Ahmad Maasoumivishget - ethnicolr: British, actual: Iranian
Jason Mikaeli - ethnicolr: British, actual: Iranian (Western first name, Iranian surname)
Marie Noelle Tranemose - ethnicolr: British, actual: French
Cathrine Falck - ethnicolr: British, actual: Norwegian
Of particular concern is that this prediction model has no Turkish group, but Turks are the largest foreigner group in Denmark. Of course, for people with mixed origin names, we cannot determine for sure what ethnicity they belong to, but we can at least make a reasonable guess. So I decided to build my own prediction model using the pre-2000 names and origin countries. This gives us this result:
There are too many small groups though, so if we simplify it to 10 broader groups:
There are remaining issues. Ehiopia, for instance, as an origin group is mixed of Christian, Muslim, and African names, so they follow 3 different linguistic patterns, and the model obviously has issues with this. We don’t have enough of them to split them into these 3 groups. Similar issue for South Africa, which is mainly Anglo names with some Dutch (Afrikaans), and a few African ones. Nevertheless, it is an improvement. In terms of model accuracy:
Accuracy tells you whether the model predicted most likely origin was the same as the one the data says (origin). AUC is the better metric in general, and it was extremely high. AUC measures how far part the probability distributions are for true cases and true negatives (in 2-way comparison) and the one above is a multi-class generation of this idea. We can understand that when we look at the predicted probabilities:
The larger groups have very distinctive probability distributions. Almost all the Vietnamese names are assigned with high probabilities and the non-Vietnamese names are not assigned high probabilities often. For smaller groups, it doesn’t work out so well. Latin American names are Spanish, so they get confused with the south European Spanish ones. Better is to look at the 10-group version:
Most of them are now looking good, but some are not. Sub-Saharan African names can be Christian or Muslim and thus get confused with the European or MENAP clusters. The confusion matrix shows this:
They are row-normalized (percentage), so you start with a row and read the values to see how each ‘true’ origin (birth country) is classified according to the model. For the main Muslim group MENAP, 98% are classified that way. For SS African, 34% are classified as MENAP due to Muslim names. The various European groups can’t get pulled apart cleanly, e.g. 16% of Romance origin names are classified into Germanic, presumably due to shared cultural origins, and the usual Frankish-Germanic mixture in France. Note that while some East Asian names are very distinctive, say, Japanese, there are only 34 Japan-born people in this dataset:
Kenichi Nakashima
Toshio Aoyagi
Miyako Jensen
Atsuko Kjær
Taeko Møller
And only some of them have full Japanese names, the others took Danish last names from spouses, confusing the model.
Autistic model details aside, this is the final best guess distribution of the people naturalizing in Denmark:
About half of them are Muslims (MENAP + Turkey), and the rest are a scattering mix of others. The most surprising is the number of East Europeans. There are 1600+ Poles alone. I guess this is the Danish version of the Poles in the UK pattern.
One could take this project further using other countries. Norway and Sweden don’t appear to do this by law, but France does. Applying our 10-cluster model to France and Denmark for comparison:
France is getting more Muslims than Denmark in % terms. I only used one law from June 2025 because they are in PDF format and behind anti-scraping measures, so getting all the files may be a hassle. If some French person can get these for me, I can apply the prediction model.
Anyway, that’s it for now. R notebook is here. Code is here.














