Does genomic prediction work?

Feb 21, 2016

Comment on: http://infoproc.blogspot.dk/2016/02/missing-heritability-and-gcta-update-on.html

First, skim this paper: http://journals.plos.org/ploso...

Genomic prediction works fairly well. This recent paper does a cross-data cross-method analysis of genomic prediction methods using 10 fold cross-validation to account for overfitting. In general, the compressed sensing/lasso/regularization methods perform well, but surprisingly an even simpler method comes out on top (mRMR):

Maximum Relevancy Minimum Redundancy (mRMR) [39] is a feature selection method which attempts to find features that are maximally relevant to the phenotype and simultaneously the selected features are non-redundant amongst each other. After features are selected, the rrBLUP method was used for GS on the selected features. mRMR can be viewed as a somewhat simplistic univariate filter-based variable selection that ranks each features based on mutual information criterion (note, however, that this univariate ranking criterion also takes into account feature’s redundancy with respect to other features).

In Figure 1, one can see that genomic prediction works. Unfortunately, their tables are only given in image forms, but they report r^2 values for each method x trait combination. I copied the data for the best method (mRMR) across all traits and did some basic calculations. The average h2 across traits is 66%. The average r2 predicted by mRMR is 32%, or about 50%. However, because it is r and not r2 that matters for practical use, we are actually over halfway towards utilizing all the predictive value of the genes for these traits. We can see this by calculating the best possible r for each trait by taking the sqrt of the h2. Then we also take the sqrt of each prediction r2. This gives us the best possible r and the best r we can do so far. Then we calculate the ratio of these, and average. This value is 69%. Thus, because of the nonlinear relationship between r and r2, despite 'only' being able to explain about half the variance, we can predict using these with about 70% of max. efficiency.

As Hsu points out, using the 16% of the recent height GWAS, this leads to an r value of about .40. The halfway point for r2 with regards to r is only 25%. I think that there is a good chance (80%) that the next height GWAS will reach this value, even without using imputed variants. http://www.ncbi.nlm.nih.gov/pu...

The still not published GWAS of cognitive ability will probably yield an r2 value (using all SNPs; polygenic score) in the area of 10%, thus yielding an r of .33. If we assume a additive/narrow heritability of 50% for cognitive ability and a sibling SD of 12 (often reported), then we get an additive within sibling pair additive h2 SD of about 8.5 (sqrt(12^2 * .5)). If we select the best of 20 embryos with r = .33, that's an average of about 4.6 IQ points of gain per generation. This is not enough for transhumanism to kick in, but it is enough to keep regression towards the mean in check for a generation. This has important societal implications for caste-like systems á la https://en.wikipedia.org/wiki/... .

Just Emil Kirkegaard Things

Discussion about this post

Ready for more?