# Twin control studies really are evidence of causation: reply to JayMan

I found some genetically informative studies on the benefits of marriage:

These studies control for genetic and shared environmental confounding in various ways and generally find some benefits of marriage on crime reduction and mental health/well-being. The benefits from non-controlled studies are found to be exagerated, of course, because they don't control for the omnipresent genetic confounding. I posted one study on Twitter:

https://twitter.com/KirkegaardEmil/status/935526706638675968

https://twitter.com/JayMan471/status/935658291224596481

JayMan (J) doesn't agree. In private, he argued that twin control studies don't rule out all confounding. This is true, they fail to rule out a very tiny amount of genetic confounding -- MZ twins are not exactly genetically identical, but they are very close. Furthermore, there is the possibility of non-shared non-genetic confounding.

I wrote:

Given that non-shared non-genetic variance is noise, one can indeed infer causation. Within MZ associations are strong evidence of causation.

(Infer here was perhaps too strong a phrasing. I meant it probabilistically.)

In J's words:

No, because it's not *all* noise (obviously so in the case of sexual orientation in discordant twins). Some of it is developmental variation. Some of it is the result of pathogens/other environmental insults.

However, he is incorrect:

Evidence means that the posterior probability is larger than the prior. In this case, a within MZ association rules out confounding due to A and C pathways. This is important because A confounding is probably the largest source of confounding, hence ruling it out increases the probability of all remaining options including causal connection. This being a longitudinal study (with a control too) also rules out reverse causation, further increasing the probability of forward causation.

Your argument is ignoring the probability change. To generalize and illustrate: one cannot declare something not evidence just because it does not rule out all possible alternative interpretations. If we know that x must be one of 1, 2, ... 10. Ruling out that it is 1-8 is strong evidence that it is 9, even if it is still possible that it is 10. Assuming equiprobable options, the probability increases by a factor of 5 (from 10% to 50%).

To give a concrete but simplified example. Suppose that whenever we find an association between two human variables like these, 60% of the time it is due to genetic confounding, 20% of the time it is due to shared environmental confounding, 10% of the time it is due to non-shared non-genetic confounding (this includes the developmental variation that J mentions) and 10% of the time it is causal. So, the prior probability of true causality is only 10%. However, if we then find that this relationship holds when we control for genetic and shared environmental confounding, the posterior probability of causality is now 50%. This is because only non-shared non-genetic confounding and true causality remains as possible options, both with 10% prior probability, and thus with 50% of the posterior. Thus, this represents a 5x increase in the probability. By one common Bayesian standard, this represents strong evidence.

Back in reality, a given link between two variables will be some mix of variance pathways (e.g. 50% genetic, 30% shared environmental confounding, 20% causal), not only a single. This does not change anything substantial about the results, only makes it more complicated. (Proof of this is left to the reader!)