17 Comments
User's avatar
Lucky Hunter and Corn Mother's avatar

The participants in the study were a minimum of 35 years old and a mean of 50 years old at the time they were sampled between 1998 and 2004. Based on this, the birth years of participants probably range from around 1933 to 1969. During this time frame, there were vary rapid and dramatic increases in the literacy rate in Mexico (https://ourworldindata.org/grapher/cross-country-literacy-rates?tab=chart&country=MEX) that caused a shift from most people being illiterate to most people being literate. I haven't found data on how the timing of this varied regionally in Mexico, but I would guess that there were large regional disparities. Even today, different Mexican states vary quite a bit in average educational attainment (https://wenr.wes.org/2019/05/education-in-mexico-2).

As the Mexico City Prospective Study participants were sampled in middle age in Mexico City, and as a past paper about this study notes, there was lots of migration to Mexico City in the 1950s through 1970s from all over Mexico, especially from the central and southern parts of the country (https://www.nature.com/articles/s41586-023-06595-3). I'm not sure what percent of the study sample grew up in Mexico City, but based on that information, it sounds like there could be lots of participants who grew up in other parts of the country.

The secular increase in education over time in Mexico is clearly environmental, as it is too fast to be genetic. I expect regional variation is due to a combination of environmental and genetic factors (ancestry varies regionally, with generally northern Mexicans having more European ancestry and southern Mexicans having more Indigenous American ancestry).

For the population analysis, if the increases in education over time occurred later in the regions with less European ancestry (I'm guessing this is the case), then you'd expect Indigenous American ancestry to correlate with lower educational attainment, which it does. This would not inherently tell you anything either way about the role of nature vs. nurture in this association, except for the fact that nurture matters enough to cause the temporal changes we see (e.g., from a literacy rate of 39% in 1930 to 83% in 1980: https://ourworldindata.org/grapher/cross-country-literacy-rates?tab=chart&country=MEX).

For the within-family analysis, the temporal trends could be a considerable source of noise. As you note, siblings are only expected to vary by a few percent in terms of their ancestry composition. From 1940 to 1950 in Mexico, literacy rate rose by 1.7% per year. Siblings a few years younger might have significantly higher educational attainment than their older siblings, with an effect greater than you'd expect from their genetic ancestry differences. Of course, in theory this shouldn't confound the model (younger siblings shouldn't have systematically different ancestry composition from their older siblings. Perhaps there could be a sampling bias if initial recruits were more educated and then the recruited siblings were on average older or younger, but I doubt that would be a big effect). However, this extra source of noise would presumably reduce the power somewhat. With access to the data, or some good simulations, perhaps it could be estimated whether this extra noise would lower the power enough to fail to find a within-family effect.

Expand full comment
Gregory Connor's avatar

Nice discussion. I am not sure what you mean by the sentence "In OLS, random measurement or estimation error in a single predictor usually doesn't affect the slope (but does affect r² and thus correlations)." Estimation error in a single predictor creates attenuation bias, which biases the coefficient toward zero. The slope is biased toward zero. I agree there are weird things going on with EA in this study. It might be due to the very noisy 4-category measurement?

Expand full comment
Gregory Connor's avatar

Maybe you are obliquely referring to the theoretical result that there is no attenuation bias under the null hypothesis that the true slope is zero? That is just an artifact that you cannot bias an estimated coefficient toward zero when the true value equals zero.

Expand full comment
Alan Perlo's avatar

An honest look at Mexican society reveals Euro descendants dominate in every top economical/intellectual category. The trend is fully seen in individuals who are 80%+ Euro vs 80+Native, but there might never be a point where people with those ancestry percentages have similar "nurture" and upbringing conditions in Mexico, given the levels of social inequality. I'd need to see real-world results change before I change my mind on the hereditarian explanation.

Expand full comment
air dog's avatar

"...siblings vary very little in ancestry, a few %points at most."

Why is the variation not zero percentage points? I thought it might be half siblings and step siblings, but it looks like they used only full sibling pairs/groups.

Also, is there a graph missing? I could not find the "red line" referenced. It seemed like something was meant to follow the preceding colon.

Expand full comment
Lucky Hunter and Corn Mother's avatar

The reason it works is that genealogical ancestry proportions and genetic ancestry proportions are not exactly the same.

Imagine a hypothetical person who is a mixture of the four types of ancestry described in this study. Let's say their father has a Yoruba father and Japanese mother, and their mother has a Spanish father and Mayan mother. Most people would look at that and say that they are a quarter of each ancestry because they have one grandparent of each. However, the DNA inherited from each grandparent will not be exactly 25% but will vary a bit due to recombination. Our person's father would inherit one copy of each chromosome from each parent and thus could be said to be 50% Yoruba and 50% Japanese (ignoring sex chromosomes and mitochondrial DNA for the moment). However, when his sperm cells are produced, these chromosomes are spliced together, but not 50/50. The chromosome 1 he passes on might be, say, 67% Yoruba and 33% Japanese. The chromosome 2 he passes on might be 71% Japanese and 29% Yoruba. Continue on down through the autosomal chromosomes, and while he'll have passed on an *average* of half of each ancestry, in practice, it will be a bit off from that. Maybe a 54%/46% split or something. The same process is occurring as the mother produces her eggs, but rather mixing together her Spanish and Mayan ancestry (which is 50/50 for her) into a mixture that is close to, but not exactly, 50/50.

Thus, we could end up with our hypothetical person having 27% Yoruba ancestry, 23% Japanese ancestry, 24% Spanish ancestry, and 26% Mayan ancestry. They would be approximately a quarter of each ancestry genetically, but not exactly. Their siblings would also undergo this process, but since it is random, the results would be slightly different. A full sibling might end up being 22% Yoruba, 28% Japanese, 27% Spanish, and 23% Mayan. So the way this study works is trying to compare lots of siblings and see what these slight differences in ancestry have on phenotypes. Note that I made this example easier to see by making every grandparent have different ancestry. In practice in this study, probably most people have all their grandparents having a mixture of European and Native American ancestry, with lots of them having a few percent African ancestry and occasionally some of them having a few percent East Asian ancestry. The same principles apply, except that the variation between siblings would cluster around the average of the parents' ancestry, not around 25/25/25/25.

The other complexity I ignored in this example (and that the authors appear to have ignored in their study) is the unusual inheritance patterns of the mitochondrial and sex chromosomes. Mitochondrial DNA is inherited on the maternal line (our hypothetical siblings above would all share 100% Mayan mitochondrial DNA). X chromosomes recombine in females producing eggs, but the X and Y chromosomes of males are passed on as is with almost no recombination. Our hypothetical people from the example would have a 100% Yoruba Y-chromosome if male. If female, they would have one 100% Japanese X-chromosome and another that is a mixture of Spanish and Mayan ancestry.

Expand full comment
Alan Perlo's avatar

Well, in the last sentence, the woman would have one Japanese X-chromosome, and the other would be EITHER Spanish OR Mayan, not a mix of both. Your explanation was great overall though!

Expand full comment
Lucky Hunter and Corn Mother's avatar

No. Her mother has 2 X-chromosomes: one Spanish and one Mayan. In egg production, there is crossover between these X-chromosomes, so the child will inherit a chromosome with a mixture of the grandparents' ancestry, in the same manner as what happens with the autosomes. This is different than the sperm production, where the father passes on either the X from his mother or the Y from his father. (Technically, even during sperm production, there is some tiny amount of crossover between the X and Y-chromosomes, in what are called the pseudoautosomal regions: https://en.wikipedia.org/wiki/Pseudoautosomal_region).

Expand full comment
Alan Perlo's avatar

Oh interesting. So a person's maternal haplogroup can be specifically traced to a certain region/lineage, but the X chromosome itself is in a sense "novel" in each generation?

Expand full comment
Lucky Hunter and Corn Mother's avatar

Yeah. Since the mitochondrial and Y-chromosomes are inherited essentially as-is, they allow you to trace direct maternal and paternal lineages, respectively, back for thousands of years. These chromosomes are only changing as new mutations pop up, so with very high quality sequence data, you can see ancestors on this direct line and estimate how far back you share that ancestor based on the accumulation of new mutations.

A man's X-chromosome is essentially identical to his daughter's, but other than that, yes, it changes each generation. The sex difference in the inheritance pattern means it changes differently than chromosomes 1-22, but unlike the mitochondrial or Y-chromosomes, it is constantly changing through recombination instead of just mutations.

If you've ever done a commercial DNA test, you can see these patterns firsthand by looking at your relatives. If you're a man, you might see men with your same last name and Y-chromosome haplogroup, as your Y chromosomes are almost identical and inherited on the male line. The lack of a surname makes them harder to find, but you may be able to find relatives you share a maternal line with, and you will have the same mtDNA haplogroup. You can also see these patterns in your shared relatives: your father's sister and her daughter will have the same mtDNA group (different than yours), and you might see several cousins with your mom's maiden name and the same Y-haplogroup (different than yours). Some DNA tests let you see where on your chromosomes you share ancestry with relatives. On both the autosomes and the X-chromosome, you'll see that (except for your parents) you will not share an entire chromosome with anyone: just fragments of a chromosome. The more distant the relative, the smaller these fragments will be.

Expand full comment
air dog's avatar

I understand you to be saying that while ancestry itself is identical between full siblings, the chromosomal characteristics of the siblings can differ somewhat with respect to the individual contributions of their (common) ancestors, and thereby also differ with respect to the contributions from the ancestors' races. I think I follow.

However, chromosomes are not labeled by nature as "x% descended from maternal grandfather (or from the Yoruba people in general)". There must be some prior analysis to attribute some of these chromosomal characteristics to various races (presumably with some probability less than 100%) in order to make any such racial allocations. Correct?

Thank you for the detailed explanation. Much appreciated!

Expand full comment
Lucky Hunter and Corn Mother's avatar

Saying that "ancestry itself is identical between full siblings" depends on what you mean by ancestry. Genealogical ancestry, counting up the number of ancestors on the family tree from different places? Yes, identical. Genetic ancestry, the proportion of DNA from those different ancestors? No, slightly different.

(Side note: an interesting implication of this is that, once you go far enough back in your family tree, you start to have genealogical ancestors who you did not inherit any DNA from. According to https://www.thetech.org/ask-a-geneticist/articles/2011/ask445/, this chance is about half a percent for any one of your great-x-4-grandparents, and it rises to over 50% for any one of your great-x-8-grandparents.)

Yes, you are exactly correct that there is no inherent way just by looking at the DNA that you can tell which grandparent that DNA came from, or what their races/ancestry is. In this study, others, and commercial DNA testing services, they have sets of reference genomes to compare to. In this study, they compared to datasets of genomes called the 1000 Genomes Project (https://en.wikipedia.org/wiki/1000_Genomes_Project) and Human Genome Diversity Project (https://en.wikipedia.org/wiki/Human_Genome_Diversity_Project). These have people from various ethnic groups around the world, I think typically limiting participation to those who have all four grandparents from that ethnic group. The ancestry results are only as good as the reference genomes, and precision depends on the relatedness of the populations and the quantity and quality of reference genomes from them. This is why it can be very reliable about distinguishing European vs. East Asian vs. sub-Saharan African ancestry (lots of reference genomes, relatively distinct populations), less accurate about if you are French vs. German, Spanish vs. Portuguese, etc. (less distinct populations), and not useful at distinguishing which Native American tribe you have ancestry from (not very many genomes in the reference populations).

Expand full comment
air dog's avatar

Thanks again! This certainly illuminates why very large sample sizes are needed for meaningful sibling regression analysis.

And yes, by "ancestry" I simply meant who a person's ancestors were.

Expand full comment
Alan Perlo's avatar

full siblings can inherit slightly different amounts of ethnic/racial ancestry

Expand full comment
air dog's avatar

Thanks. I still don't quite get it, though.

Ancestry of full siblings is identical, since the siblings have in common each and every ancestor, of whatever ethnic and racial attributes those ancestors may have had.

So, I gather you are saying that something other than ancestry itself is "inherited" by siblings and that this inheritance may be slightly different by sibling? Maybe some genetic characteristic(s) that can be shown to differ by races or ethnic groups?

Expand full comment
Alan Perlo's avatar

I'm not an expert, but you can literally be more related to a specific grandparent by DNA inheritance than your sibling is. I'm sure an AI model can give a more detailed answer, but I've been following DNA research for a few years. It can even be the case for people with 100% ancestry from an ethnic group, like Irish or Dutch for instance.

Expand full comment
repsych's avatar

Thanks; was "subsetting by ancestry to makes it much smaller" meant to read "subsetting by ancestry too makes it much smaller" or possibly "subsetting by ancestry to make it much smaller"?

Expand full comment