IQ is the most predictive variable in social science*

Taking construct validity seriously

Jul 15, 2025

You will often find intelligence researchers and enjoyers make claims like the following:

g can be said to be the mostpowerful single predictor of overall job performance. First, no other measured trait, except perhaps conscientiousness (Landy et al., 1994, pp. 271, 273), has such general utility across the sweep of jobs in the U.S. economy. More specific personality traits and aptitudes, such as extraversion or spatial aptitude, sometimes seem essential above and beyond g, but across a more limited range of jobs (e.g., Bat-rick & Mount, 1991; Gottfredson, 1986a). [Gottfredson 1997]
IQ is strongly related, probably more so than any other single measurable human trait, to many important educational, occupational, economic, and social outcomes. Its relation to the welfare and performance of individuals is very strong in some arenas in life (education, military training), moderate but robust in others (social competence), and modest but consistent in others (law-abidingness). Whatever IQ tests measure, it is of great practical and social importance. [Gottfredson 1994/1997]
For example, it has been shown that there is no other single psychological trait in psychology that contributes more than g to the understanding as well as to the prediction of human achievements. [Neubauer & Opriessnig 2014]

These older statements were mainly based on narrative reviews, and not direct comparisons between personality traits (or non-cognitive traits) and intelligence or IQ. However, there are a few such recent studies as well. First, Charles Murray in Human Diversity (2020) provides this comparison table based on large studies:

A more recent direct comparison mega-study was by Zisman and Ganzach (2022):

We conduct a replication of Borghans, Golsteyn, Heckman and Humphries (PNAS, 2016) who suggested that personality is more important than intelligence in predicting important life outcomes. We focus on the prediction of educational (educational attainment, GPA) and occupational (pay) success, and analyze two of the databases that BGHH used (the NLSY79, n = 5594 and the MIDUS, n = 2240) as well as four additional databases, (the NLSY97, n = 2962, the WLS, n = 7646, the PIAAC, n = 3605 and the ADD health, n = 3553; all databases are American except of the PIAAC which is German). We found that for educational attainment the average R2 of intelligence was .232 whereas for personality it was .053. For GPA it was .229 and .024, respectively and for pay it was .080 and .040, respectively.

But here I want to take a step back and think about it a little harder. I think that the case is less clear-cut than it appears. First, consider the general issue of measurement error in the broad sense:

Here we distinguish between two sources of error in measurement:

Random error from noise in the measurement.
Construct invalidity error meaning that you aren't measuring the intended construct perfectly.

Since the statistical path between the observed variables goes through each of the other variables, to estimate the true relationship, we must statistically remove these errors insofar as they are present. These errors can be very large if measurement is poor. Usually, researchers take care to ensure that reliability isn't too low. The method for doing this is just to make the tests longer or for outcomes, averaging over more observations. For instance, for job performance, the outcome may be an average of multiple supervisors or years of 'objectively' measured performance (e.g. sales, lines of code written, crimes solved). Because of such efforts, reliabilities are usually decently high, often about 0.90 for psychological construct with good scales (typical of IQ batteries). However, the construct validity issue is more murky and less often discussed. Consider the example where we are interested in general intelligence (g) and its relationship to life-time income. However, in the study, only vocabulary was measured. Even if we adjust for the imperfect measure of vocabulary, we still have the issue that vocabulary ability isn't g, though it may correlate well with it. In a typical study, a good vocabulary test has a g-loaded of about 0.80 without adjusting for unreliability. This would roughly mean that the product of the two paths is about 0.80. If we also know that reliability of the vocabulary test is about 0.90, then we can infer that the construct validity (true g-loading) is 0.8 = X * 0.9, X = 0.89.

Here comes the issue for personality research. In 99%+ of studies, self-reported personality scales are used, and though we can estimate their reliability (e.g. how well does extroversion correlate with extroversion if the same subject fills out the survey 3 days apart?), the question of construct validity is usually ignored. Do we really think that random errors aside, people have perfect perception of their own personalities? Of course not. In 2017, I found one of the very few studies that examined this question by comparing the validity of self vs. other rated personality for various outcomes. The key table is this:

Or in their own words:

The bulk of personality research has been built from self-report measures of personality. However, collecting personality ratings from other-raters, such as family, friends, and even strangers, is a dramatically underutilized method that allows better explanation and prediction of personality’s role in many domains of psychology. Drawing hypotheses from D. C. Funder’s (1995) realistic accuracy model about trait and information moderators of accuracy, we offer 3 meta-analyses to help researchers and applied psychologists understand and interpret both consistencies and unique insights afforded by other-ratings of personality. These meta-analyses integrate findings based on 44,178 target individuals rated across 263 independent samples. Each meta-analysis assessed the accuracy of observer ratings, as indexed by interrater consensus/reliability (Study 1), self–other correlations (Study 2), and predictions of behavior (Study 3). The results show that although increased frequency of interacting with targets does improve accuracy in rating personality, informants’ interpersonal intimacy with the target is necessary for substantial increases in other-rating accuracy. Interpersonal intimacy improved accuracy especially for traits low in visibility (e.g., Emotional Stability) but only minimally for traits high in evaluativeness (e.g., Agreeableness). In addition, observer ratings were strong predictors of behaviors. When the criterion was academic achievement or job performance, other-ratings yielded predictive validities substantially greater than and incremental to self-ratings. These findings indicate that extraordinary value can gained by using other-reports to measure personality, and these findings provide guidelines toward enriching personality theory. Various subfields of psychology in which personality variables are systematically assessed and utilized in research and practice can benefit tremendously from use of others’ ratings to measure personality variables.

Some of the values they found are very high indeed, higher than the g relationship plausibly is. We can also make this kind of finding plausible from another perspective. If g predicts a given outcome with r = 0.50 after we remove measurement issues, this means that 0.50^2 = 25% of the variance in that outcome can be explained. In other words, it means that 75% cannot be explained by g. And while variances are deceptive, this conclusion is correct here. Thus, if the outcome is not substantially due to literally random factors, then we would need to find another 2-3 causes of similar size to explain the entire variance in that outcome. Clearly, something else must be important too unless you think the other causes are 100 small things. If you know a lot of smart people, you will also know that there is a massive variation in their life outcomes left unexplained by intelligence itself. Some people with perfect SATs and IQs of 130+ are not doing well in life, failing to build relationships, or bouncing between jobs forever. Something else explains this consistent failure. I like to think of it this way:

Thus, among your smart peoples, the variation you see is not much to do with intelligence (too little variation among them), but they usually show very large variation in work ethics (some are lazy, some are workaholics) and mental health. Aside from making very smart investment decisions, laziness generally means you will be relatively poor. I know multiple ~130 IQs who just refuse to get or keep steady jobs and consequently they live in relative squalor (Bryan Caplan would say that they just have a high preference for free time). There are also obvious cases of mental issues explaining problems. There are some people who while smart and not lazy nevertheless are always in trouble, and those are mainly of their own making by consistently making poor life decisions.

If we were somehow able to measure these three causes of life success well, I think they might have around equal validity. The fact that in studies like that by Zisman and Ganzach above, IQ trumps easily is that intelligence is just much easier to measure than these other traits. We know from decades of research in behavioral genetics that when traits are measured using self-report, they show much weaker heritabilities, and when they are measured using other people's opinions, especially neutral third parties, they show much higher heritabilities (this is true for big 5 traits as well which show heritabilities around 75% when using other ratings). I take this as indirect evidence that self-ratings are generally bad. If we take the self vs. other rated personality study as our estimator of construct validity of self-rated personality scales, we reach the conclusion that self-rated personality correlates with true personality (average of hypothetical infinite set of raters familiar with the subject) around 0.33 to 0.50. In other words, to compute the validity of a typical study using self-rated personality, we would have to multiple the effect size we see by 2-3. A relatively small extroversion effect of 0.15 thus becomes 0.30 or even 0.45.

Now there are other issues with other ratings, especially halo effects, which cause issues that probably leads to overestimates of validities. One of the findings using other ratings is that big 5 traits become substantially more correlated, which also means that their validities cannot be added up in a simple fashion since one must take their covariance into account. Nevertheless, I would not be surprised if taking all the big 5 traits together using proper error corrections would mean that personality is more predictive than g for many life outcomes. So the practical purposes, this still means that IQ is king but that is because it is easy to measure, not because it is the most important psychological trait causally speaking.

Prodigy

FIRST. SORRY, ILL READ IT NOW LOL.

Expand full comment

1 reply

Peter Gerdes

I don't think the predictive power is ever what has made IQ more compelling. It is how we think about choice and possibility.

If you predict that someone will fail to become an engineer because they aren't conscientious enough people treat that as something they are choosing. It feels like they could just put in more effort, focus more of details or whatever. If you tell them they will fail because they lack the IQ to be an engineer it feels like you are saying to them they aren't good enough or they are inferior in some way.

Basically, we care about IQ because we’d all like to be smarter but we don't all want to have different big 5 traits. We may vaguely say we want to be less lazy but at each time we choose to be lazy it feels like a choice we make while not being as quick to get a solution or see a possibility feels like something we'd like to do but can't.

8 more comments...

Just Emil Kirkegaard Things

Discussion about this post