Book review: The Intelligence Men: Makers of the I.Q. Controversy (Fancher 1985)
A little history
I've been reading old writings recently, which is why I was posting about Terman's views. Some years ago, I heard about the book called The Intelligence Men: Makers of the I.Q. Controversy (1985) by Raymond Fancher, and Gwern provides us with a PDF. I've been tweeting some excerpts with comments, and this post turns those into a concrete post that's easier to find later.
This short book provides a narrative history of the field through a serious of biographies in mostly chronological order:
1 The Nature-Nurture Controversy 1
John Stuart Mil, Francis Galton
2 The Invention of Intelligence Tests 41
James Cattell, Alfred Binet
3 Intelligence Redefined 84
Charles Spearman, William Stern, Henry Goddard
4 The Rise of Intelligence Testing 117
Robert Yerkes, Lewis Terman
5 Twins and the Genetics of IQ 162
David Wechsler, Cyril Burt, Arthur Jensen, Leon Kamin
6 Conclusion 226
Back in 1971, Richard Herrnstein (coauthor of The Bell Curve) wrote a 19-page summary of the evidence concerning intelligence in The Atlantic (PDF thanks to Gwern again). The reception wasn't much better back then than what you would expect in modern times:
Herrnstein's article aroused a large but generally decorous response, both pro and con, from fellow scientists who flooded the Atlantic with letters to the editor. In general, these represented the sort of thoughtful response he had hoped to stimulate. By raising the IQ issue in the popular press, however, Herrnstein also touched off a more irrational and disturbing public reaction.
In Boston, radical college students promptly distributed leaflets entitled "Fight Harvard Prof's Fascist Lies," and staged a demonstration outside the Atlantic's editorial offices to protest the publication of Herrnstein's article. Soon thereafter Harvard radicals initiated a "fall offensive" against Herrnstein by planting non-enrolled demonstrators in his lecture classes. "Wanted for Racism" posters bearing his photograph appeared on campus, accusing him of "misusing science" in support of "racial superiority, male supremacy, and unemployment." When he tried to deny these unwarranted and irresponsible charges at a public meeting, he was heckled from the floor as a "political reactionary," and as he left the hall a member of the audience threatened to stab him some night in the Harvard Yard.
Herrnstein's reputation with student radicals spread throughout the country, and even rose to plague him when he visited the University of Iowa to lecture on animal psychology, a topic far removed from the issue of intelligence testing. Activist groups there circulated slanderous leaflets like those in Boston, and packed the auditorium with demonstrators before his scheduled talk. As Herrnstein approached the lecture hall he heard some three hundred people inside rhythmically shouting "We want Herrnstein," and decided to follow police advice by departing in an unmarked car. Two weeks later, he felt compelled to cancel another scheduled talk on animal psychology, at Princeton University.
Nevertheless, he soldiered on and 2 years later had produced a book along the same lines (guess who has a PDF of that too).
The book takes J. S. Mill as the starting point:
Advocate of an associationistic psychology which emphasized the experiential basis of all human knowledge, Mill was openly contemptuous of the nativist argument, writing, "Of all vulgar modes of escaping from the consideration of the social and moral influences on the human mind, the most vulgar is that attributing the diversities of human conduct and character to inherent original natural differences." Consistent with these beliefs, Mill argued that the greatest differences between races and sexes, as well as between individuals, are due to environmental and circumstantial factors. Sometimes called "the patron saint of liberty," Mill and his arguments are still widely cited by supporters of the nurture side in the nature-nurture controversy.
Mill didn't really do any science on the matter, but he provides an example of someone arguing from one side of the empiricist vs. rationalist debate in philosophy. This question concerns whether knowledge comes chiefly from experience (Mill's position) or whether some of it can be arrived at by pure reason (Kant, Plato etc.). Though there is no particularly logical connection to behavioral genetics, but the concept of the blank slate (tabula rasa) was invoked by John Locke and he was in the empiricist camp, and so was J. S. Mill. To our modern minds, the connection is strange because the focus on empirical data is what brought us to realize that evolution exists, but evolution implies heritable variation in the mind, thus undermining any strict notion of a truly blank slate. Maybe no one is born with specific ideas about truth, but certainly people differ in their tendencies and traits at birth. Recalling his own childhood regiment of early education:
Thus John Mill came to see his father's educational experiment as a great success, at least in the intellectual realm. He believed he was living proof that virtually anyone could be taught virtually anything, given ideal conditions. Even as an eminent adult he argued, with complete sincerity, that natural gifts had had little to do with his success, because he had few such gifts: "If I had been by nature extremely quick of apprehension, or had possessed a very accurate and retentive memory, the trial would not have been conclusive; but in all these natural gifts I am rather below than above par. What I could do, could assuredly be done by any boy or girl of average capacity and healthy physical constitution." Similarly, however, Mill attributed his weaknesses to failures in his education and upbringing; he blamed his clumsiness on a lack of practical exercise, for example, and his early social ineptitude (which he later took pains to correct) on a lack of outside company. When he experienced a painful "mental crisis" in young manhood, and worked his own recovery partly through his discovery of Romantic poetry, he blamed his breakdown on the fact that he had never been trained to cultivate the feelings along with the intellect.
From an outside perspective, of course, one may question the accuracy of John Mill's self-assessment—noting that his very lack of experience with other young people made it impossible for him to judge whether he was comparatively "below par" or not. One may also note, like Norbert Wiener, that Mill's younger sisters and brothers, who received educations similar in many ways to his own, never reached his own level of accomplishment. The important point, however, was that Mill himself believed he was primarily the product of his environment and training, and he generalized this into a view of all people as creatures of circumstance, potentially almost indefinitely malleable to early environmental influences. This view was clearly echoed in the associationistic psychology Mill espoused as an adult, and which underlay his many influential social and political writings.
It is curious how often these 'argument from my own childhood experience' takes are invoked in debates. Mill seems like a generally depressed fellow with lots of self-doubts, so it is not too surprising that he considered himself below par A very faulty generalization can result from this severe misjudgment. J. S. Mill was not below par, but a child prodigy! Somehow he failed to see how his own siblings who enjoyed a quite similar environment didn't turn into prodigies. If the mind was a blank slate, why did they not? Curious oversight.
Mill was perhaps also the first to invoke the moral argument for giving preference to environmental sources of variation:
As his reference to sexual equality implies, Mill believed that environmental explanations ought to take precedence on moral as well as logical grounds, a view echoed by environmentalist social critics to the present day. Mill expressed this view most energetically in his Autobiography:
The prevailing tendency to regard all the marked distinctions of human nature as innate, and in the main indelible, and to ignore the irresistible proofs that by far the greater part of those differences, whether between individuals, races, or sexes, are such as not only might, but naturally would be produced by differences in circumstances, is one of the chief hindrances to the rational treatment of great social questions, and one of the greatest stumbling blocks to human improvement.
If people in power believe that the poor and disadvantaged occupy their lowly positions in life because of an innate and "natural" inferiority, they will see little reason for even trying to improve the environments of the poor. Thus politicians have a moral obligation to accept the environmentalist explanation, at least as a working hypothesis.
In sum, Mill did not deny that individuals and groups vary considerably in the quality of their character and intellect, nor did he altogether deny the possibility that some part of the variation was innate. His own upbringing had impressed upon him the great power and pervasiveness of environmental factors, however, and so he argued that it was the logical scientific duty of the psychologist or ethologist, and the ethical obligation of the politician, to thoroughly test out environmental hypotheses before anything else. He had little doubt that many of these hypotheses would prove true, and that the final "residuum" left over to be attributed to innate differences would be relatively small.
So when you hear Turkheimer or whoever make this same argument now a days, you will know that it isn't new at all. In fact, the chief benefit of reading old stuff is how often you realize that not much is new in this line of thinking the last 200 years.
Moving on to Galton a few decades later:
In Hereditary Genius [1869], Galton presented the prototype for what has since been called the adoptive family method. He noted that it was once common for Roman Catholic popes to "adopt" young boys and bring them up in their own households as "nephews," who thus shared the environmental but not the genetic advantages of eminent families. Galton tried to determine if these boys went on to attain eminence themselves in anything like the proportion that would be expected of the natural sons of eminent fathers:
I do not profess to have worked out the kinships of the Italians with any special care, but I have seen amply enough of them, to justify me in saying that . . . the very common combination of an able son and an eminent parent, is not matched, in the case of high Romish ecclesiastics, by an eminent nephew and an eminent uncle. The social helps are the same, but hereditary gifts are wanting in the latter case.45
Galton clearly did not lavish the same statistical care on this analysis that he did on his compilation of positive hereditary relationships, and a critic could rightly argue that his test sample was small and highly unrepresentative. Few objective observers would agree with Galton that this study conclusively ruled out any major influence for environment in the production of eminence. Nevertheless, the basic method underlying the study was sound. Adopted children do provide a potentially useful comparison group in studies of familial similarity. As we shall see in Chapter 5, later generations of researchers have employed the adoptive family method with increasing degrees of sophistication, if still with somewhat inconclusive results.
So Galton invented the concept of the adoption study. As far as I know, no details of this inquiry was published. Likewise, for twins:
Another technique for separating the effects of heredity and environment on mental development occurred to Galton in the early 1870s, when he became interested in twins. He learned that, biologically speaking, there are two different kinds of twins: those who develop from the separate (though nearly simultaneous) fertilization of two ova by two sperm; and those who result after a single fertilized ovum splits in two, and the two halves develop into separate individuals. The first type, now referred to as fraternal or dizygotic twins, bear the same genetic similarity to each other as ordinary siblings; the second type, identical or monozygotic twins, are genetically identical. Galton's attention may originally have been drawn to the issue because he himself had a pair of nephews who were identical twins, and an aunt and uncle who were a fraternal pair. In any case, he believed that a comparison of the similarities between co-twins of the two types could throw light on the nature-nurture question, because while both types share similar environments, only the identical twins have exactly the same heredity. Here was the basic idea for the twin-study method, which Galton introduced in his 1875 paper, "The History of Twins, as a Criterion of the Relative Powers of Nature and Nurture."
In this original study, Galton solicited case histories of as many twins as he could locate, and discerned two striking categories. Some twins, including his nephews, went through life showing remarkable similarity to each other in both physical and psychological qualities, sometimes in spite of having experienced quite different life circumstances. Others, in contrast, went through life very differently from each other, showing markedly different characters, sometimes in spite of having been deliberately treated as similarly as possible by their families. Galton lacked direct evidence on the matter, but reasoned that twins with highly similar character must have been monozygotic, their psychological similarity deriving from their genetic identity. The dissimilar twins he presumed to be dizygotic, differing in the same degree that ordinary siblings are known to do. He confidently summarized: "There is no escape from the conclusion that nature prevails enormously over nurture when the differences of nurture do not exceed what is commonly to be found among persons of the same rank in society and in the same country."
Gavan Tradeux dug into the Galton archives and notes in a reply:
Galton had more twin data than he published and took great care over it, as I discovered from the extensive materials that still survive today. I cover this in a substantial chapter of Galton's Genius (2023), which includes many new details. Raymond Fancher is not a good guide to his work, though he worked for many years on a Galton biography that he was never able to finish (his organizing theme being that Galton had an Adlerian inferiority complex, which is pretty funny really).
I haven't read his book yet, so I don't know the details. However, it is clear that Galton was thinking along the right lines. He had good priors and came up with methods that could investigate them. Good priors go a long way!
Since Galton realized the power of genetics because of evolution, he realized also that we must ensure that evolution is taking a favorable route, that is, eugenics:
The founding parents of a eugenic society, Galton believed, should be people like those he studied in Hereditary Genius: talented individuals who became eminent because of their positive contributions to society. A major problem arises, however, because such eminence customarily does not arrive until middle age. Galton wanted a means of identifying potentially eminent people earlier, while they were still at prime childbearing age. Thus he imagined the development of a series of examinations for young adults "natural ability," capable of predicting which among them were likely to make eminent contributions later on. High-scoring men and women would be encouraged to intermarry, somewhat as in the following whimsical scene from "Hereditary Talent and Character":
Let us then, give reins to our fancy, and imagine a Utopia ... in which a system of competitive examinations ... had been so developed as to embrace every important quality of mind and body, and where a considerable sum was allotted to the endowment of such marriages as promised to yield children who would grow into eminent servants of the State. We may picture to ourselves an annual ceremony in that Utopia, in which the Senior Trustee of the Endowment Fund would address ten deeply-blushing young men, all of twenty-five years old, in the following terms:—"Gentlemen, I have to announce the results of a public examination, conducted on established principles; which show that you occupy the foremost places in your year, in respect to those qualities of talent, character, and bodily vigour which are proved, on the whole, to do most honour and best service to our race. An examination has also been conducted on established principles among all the young ladies of this country who are now of the age of twenty-one, and I need hardly remind you, that this examination takes note of grace, beauty, health, good-temper, accomplished housewifery, and disengaged affections, in addition to the noble qualities of heart and brain. By a careful investigation of the marks you have severally obtained, ... we have been enabled to select ten of [the young ladies'] names with special reference to your individual qualities. It appears that marriages between you and these ten ladies, according to the list I hold in my hand, would offer the probability of unusual happiness to yourselves, and, what is of paramount interest to the State, would probably result in an extraordinarily talented issue. Under these circumstances, if any or all of these marriages should be agreed upon, the Sovereign herself will give away the brides, at a high and solemn festival, six months hence, in Westminster Abbey. We, on our part, are prepared, in each case, to assign 5,000£ as a wedding-present, and to defray the cost of maintaining and educating your children, out of the ample funds entrusted to our disposal by the State."
In this fancifully stated but seriously intended passage, Galton introduced the idea (though not the name) of the intelligence test to the world. Thus the intelligence test was seen as a measure of people's differing hereditary worth from its very inception; it is no mere coincidence that questions of genetics and intelligence testing have been inextricably intertwined ever since.
His eugenics program was designed for Victorian England, but something not so unlike this was eventually tried in ... Victorian Asia under Lee Kuan Yew in Singapore. It didn't go so well though.
Galton tried to design tests of intelligence, but here he had a somewhat bad prior because he was too empiricist we might say. His model of intelligence started with the senses. He thought that more intelligent people had better and more ideas in part because they had better sense perception, which causes their data quality to be better and this has trickle down effects. While this is also true, this doesn't mean that this is the best way to measure intelligence for practical purposes, which it turns out it is not. Alfred Binet was paying attention to these ideas, however:
Even as Binet was winding up his affairs at the Salpêtrière, he was conducting a small series of experiments at home which markedly influenced his later career. He had developed the habit of trying out all sorts of tests and puzzles on his young daughters Madeleine and Alice, born in 1885 and 1887, respectively. These early home experiments culminated in three short articles published in 1890. While belonging chronologically to the end of his Salpêtrière period, these papers marked the logical beginnings of his new career as an experimental child psychologist and "individual psychologist."
Several of the tests and tasks in these early experiments were derived from the Galton and Cattell series, assessing reaction time and various forms of sensory acuity. Binet found that his daughters and their small friends had average reaction times about three times longer than typical adults, but with much greater variability. On some trials the children responded just as quickly as adults, but on others they were much slower. Since the children could sometimes match the adult speed, Binet concluded that the crucial factor differentiating children from adults was not reaction time per se, but rather the ability to sustain attention to the task. When children paid attention they responded like adults, but on those frequent occasions when their attention wandered, their reaction times increased drastically. This finding reinforced Binet's conviction of the importance of attention in mental life, and he would continue throughout his career to emphasize its importance in the development of adult intelligence.
Binet's investigations of sensory acuity showed that children's senses were often much sharper than commonly believed. For example, Madeleine's ability to judge the relative lengths of parallel lines, or the relative sizes of pairs of angles, actually exceeded that of many adults.
These earlier findings were later replicated by Jensen and others' work on reaction times in the 1980s and forward (read Clocking the Mind). Indeed, it was confirmed that smarter people have faster and less variable reaction times.
Tests of "color sense" like those in Cattell's battery, which required subjects to name color patches as quickly as possible, generally revealed a large superiority of adults over children. Binet discovered, however, that tests requiring subjects to match colors showed very much smaller differences. This indicated that the children's perceptual and sensory abilities of color discrimination were really very good. The major inferiority to adults was linguistic, residing in their slowness to assign proper names to their color perceptions.
On another test requiring language use—this one very different from anything in the Galton or Cattell batteries—Binet found even more striking differences between children and adults. He simply asked his young subjects to define a series of everyday objects, and discovered that their thoughts immediately leapt to the uses of the objects inquired about or to the actions habitually taken with or toward them. Thus a knife was simply "to cut meat"; a box "means put candies inside"; and a snail was, emphatically, "Squash it!" The young girls did not and indeed could not "define" the concepts as an adult would:
It is clear that a little girl is incapable of defining. When you say "definition" you imply a certain work of reflection, of comparison, of elimination, etc. The little child that we studied responded immediately without thinking, and their replies express very simply the first images which were evoked by the name of a certain well-known object.
Binet's discovery of this "functional" or "utilitarian" nature of young children's thought, as compared to the much greater abstraction of adults, led him to recognize the increasing capacity for abstraction as one of the hallmarks of increasing intelligence.
And in fact, vocabulary remains one of the best ways to measure intelligence if the right conditions are fulfilled. These are increasingly hard to fulfill, however, because people's cultural backgrounds now vary more due to immigration and multiple language exposure. For instance, I have been using English professionally for decades and rarely use my native Danish for anything serious, hence my Danish vocabulary is depressed compared to what it would have been, and my English is also depressed because of non-native language bias. There is no vocabulary test I can take that yields accurate results as a measure of g.
Moving on to Charles Spearman.
Sometime around 1890, a young English army officer named Charles Spearman (1863–1945) started reading psychology textbooks in his spare time. The product of an English upper-class school, he had even as a boy masked a secret propensity for philosophical speculation beneath an aggressive and competitive exterior. Following school he had self-consciously decided to follow the example of the philosopher René Descartes by joining the army to see something of the practical world. As a member of the Royal Engineers, Spearman served and was decorated in the Burmese Wars of the 1880s, but even as his military career was thriving he continued to seek intellectual stimulation for his hidden philosophical side. Thus he began reading psychology books. By chance, the first texts he found were by John Stuart Mill and other associationists, who argued that most if not all mental experiences could be explained through the various forms of the laws of association. As Spearman later recalled, he responded to these works very strongly:
My reaction to all this view was intensely negative. The ideas and arguments appeared to me astonishingly crude, equivocal, and erroneous. But even so, my conviction was accompanied by an emotional heat which cannot . . . be explained on purely intellectual grounds.
The source of this heat I take to have been—little as I admitted this to myself at the time—of an ethical nature. Sensualism and associationism tend strongly to go with hedonism; and this latter was (and is) to me an abomination.
He read psychology textbooks in his spare time as an army officer. And he had an emotional revulsion to Mill's associationism. Surely an odd reaction by modern standards, but it doesn't matter ultimately in science what the original motivation was. Science is a process, and if the process is followed correctly, it leads to truth, no matter people's intentions or desires.
In the village school, Spearman estimated the "intelligence" of twenty-four children in three ways: by having their teacher rank them for their "cleverness in school," and by having the two oldest children rank the members of their class for "sharpness and common sense out of school." Spearman also ranked the children's performances on three sensory tasks involving pitch, light, and weight discrimination. When he calculated the intercorrelations among these six measures, he found that the three "intellectual" variables correlated with each other at an average of +.55, while the corresponding average within the three sensory measures was +.25. For the crucial correlations between intellectual and sensory measures, the average was +.38. This last value, of course, seemed to provide modest support for Galton's hypothesis, and was considerably higher than what Clark Wissler had obtained in his somewhat parallel study of "mental tests" with Columbia University students.* Spearman had not known of the Wissler study when he conducted his own, however, and when he finally did learn of it he was much troubled until an important insight occurred to him:
Had I seen [Wissler's] work earlier, I should certainly have thought the matter disposed of and should never have started my own work in this direction. Since the conflicting results were there, however, they had at least to be explained. After much pondering over them, I had at last a happy thought which embodied itself in the concept of "attenuation."³
This "happy thought," which would have great consequences for Spearman's subsequent work, was his realization that any empirically observed correlation between two variables will underestimate the "true" degree of relationship, to the extent that there is inaccuracy or unreliability in the measurement of those two variables. Further, if the amount of unreliability is precisely known, it is possible to "correct" the attenuated observed correlation according to the formula (where r stands for the correlation coefficient)
r_true = r_observed / √(reliability of variable₁ × reliability of variable₂)
Spearman read the Wissler study (on reaction time tests), which seemingly disproved some of Galton's ideas regarding correlations involving reaction time and basic sensory discrimination with intelligence. Spearman figured out it had in part to do with low reliability. Now a days, we might say that the main reasons are that simple reaction time tests are not very g-loaded even if reliably measured and the restriction of range in a small sample of elite university students further reduced any expected correlation due to 'restriction of range' (variance reduction).
In fact, Spearman's own procedures were not beyond reproach, and were almost immediately questioned by some of his contemporaries. For the moment we shall defer criticism, however, and follow his theoretical arguments to their conclusion. After dismissing Wissler with the aid of the attenuation concept, he went on to apply the correction formula to his own data. Though Spearman did not doubt that these data were considerably better than Wissler's had been, he modestly allowed that they too had been less than perfectly reliable. He knew, for example, that his three "intelligence" ratings did not perfectly agree with each other, but intercorrelated at an average level of .55. The three sensory measures were even less consistent, intercorrelating at an average of only .25.
Spearman now put his correction formula to use, suggesting that one think of all three of the intellective measures considered together as an index of a hypothetical "General Intelligence," and the three sensory measures as an index of "General Sensory Discrimination." He proposed to take his observed Intelligence × Sensory Discrimination average of .38 as the observed correlation between "General Intelligence and General Discrimination," and then to "correct" it by taking .55 and .25 as the "reliabilities" of the two general variables. Thus he calculated the "true" or theoretical relationship between General Intelligence and General Discrimination in the following equation⁵:
.38 / √(.55 × .25) = 1.01.
Correlations greater than 1.0 (which represents a perfect correspondence between the variables) are theoretically impossible, but Spearman attributed the slight excess in his calculated figure to random errors, and assumed that the "true" value was really the perfect 1.0.
Spearman then tried to do what was essentially a latent variable analysis to get the disattenuated correlation between composite variables, but did it wrong (these reliabilities are incorrect). We don't have the correlation matrix here, so we can't do a structural equation model of the variables. However, if we apply the Cronbach's alpha to the average intercorrelations reported, we can surmise the reliabilities were about 0.79 and 0.50. Thus, the disattanuated correlation between his general factors should be around 0.38/sqrt(0.79*0.50)=0.61. This is far from 1.00 that he thought he got. However, the result is in line with modern findings (Deary et al 2004):
At the centenary of Spearman's seminal 1904 article, his general intelligence hypothesis remains one of the most influential in psychology. Less well known is the article's other hypothesis that there is "a correspondence between what may provisionally be called 'General Discrimination' and 'General Intelligence' which works out with great approximation to one or absoluteness" (Spearman, 1904, p. 284). Studies that do not find high correlations between psychometric intelligence and single sensory discrimination tests do not falsify this hypothesis. This study is the first directly to address Spearman's general intelligence-general sensory discrimination hypothesis. It attempts to replicate his findings with a similar sample of schoolchildren. In a well-fitting structural equation model of the data, general intelligence and general discrimination correlated .92. In a reanalysis of data published by Acton and Schroeder (2001), general intelligence and general sensory ability correlated .68 in men and women. One hundred years after its conception, Spearman's other hypothesis achieves some confirmation. The association between general intelligence and general sensory ability remains to be replicated and explained.
Much work remains to be done on this topic. Galton's intuition was right, low level sensory perception skills do have nontrivial associations with g but this cannot so easily be found due to a number of measurement issues that he was not aware of.
Moving on to William Stern:
The second study [by Stern] investigated the gaps between the mental and chronological ages of children who had been clearly diagnosed (by means other than the Binet tests) as imbeciles, morons, borderline retardates, or as individuals of low normal intelligence. The average gap for each group turned out to depend on the chronological ages of the children involved, as well as on the severity of their diagnoses. For example, eight-year-old imbeciles had an average mental age of 5.7, or 2.3 years behind their chronological age, while twelve-year-old imbeciles scored at 7.3, fully 4.7 years behind. The morons were retarded by 1.9 years at age eight, and by 3.3 years at twelve; thus the older morons were farther behind than the younger imbeciles. Twelve-year-olds in the low normal group were retarded 2.0 years, almost as much as the young imbeciles. In sum, this study showed that the absolute difference between mental and chronological age, considered by itself, was not an accurate index of the severity of retardation; its relationship to the chronological age had also to be taken into account.
As Stern summarized these findings, he made a simple but highly influential suggestion:
Since feeble-mindedness consists essentially in a condition of development that is below the normal condition, the rate of development will also be a slower one, and thus every added year of age must magnify the difference in question, at least as long as there is anything present that could be called mental development at all. With this in mind it is but a step to the idea of measuring backwardness by the relative difference; i.e., by the ratio between mental and chronological age, instead of by the absolute difference.
Stern named this ratio—the mental age divided by the chronological age—the intelligence quotient; in 1916 Lewis Terman, whom we shall meet in the next chapter, suggested multiplying the quotient by 100 to remove fractions, and abbreviated the term as "IQ." Thus was introduced one of the most popular terms in the modern psychological vocabulary.
Note that modern IQ scores are not calculated this way. The mental age approach fails for adults as the mental maturation hits a ceiling at late teens. So modern IQ scores are based on age comparisons and deviations from the mean (hence why they are called deviation IQs). Ideally, you would find a large group of people of the same age, and then approximate the distribution of scores in that group. Using this distribution, a given score can be converted to a centile, say, 70th, meaning that the subject scored higher than 70% of other similarly aged test takers. Then it's just a matter of converting that to a z score based on an assumed distribution, which in this case would be 0.52 z (assuming it is normal). Multiply that by desired standard deviation and add desired base value to get the IQ (in this case 108) or any other metric you want (e.g. SAT 1000/200, PISA 500/100). Thus, beware when reading older studies as the IQ scores reported may be in mental age units and not comparable to the deviation IQs. This is how those sky-high IQ estimates were made for various eminent figures (including J. S. Mill), and not because they were 5+ standard deviations above the mean.
Moving on to Terman (like in my prior post):
Terman wrote long papers at Indiana on "degeneracy" and the "great man theory," as well as a master's thesis on leadership in children. Though these were not major scholarly contributions in themselves, their preparation exposed Terman "to almost everything I could find in the library, in English, German, or French, on the psychology of mental deficiency, criminality, and genius."24 Figuring prominently in this reading were the works of two men who became his intellectual ideals:
Of the founders of modern psychology, my greatest admiration is for Galton. My favorite of all psychologists is Binet, not because of his intelligence test, which was only a by-product of his life-work [and which had not yet been developed when Terman first read him in 1901 and 1902], but because of his originality, insight, and open-mindedness, and because of the rare charm of personality that shines through all his writings.25
Inspired by Galton and Binet as well as his Indiana teachers, Terman "became fired with the ambition to become a professor of psychology and to contribute something myself to the science."26 This ambition seemed thwarted when funds ran short, however, and Terman prepared to return to high school teaching. Then in the midst of his job search he received an unexpected offer of a graduate fellowship for Ph.D. study at Clark University. With a substantial loan from his family, Terman was able to accept, and to study under the eminent G. Stanley Hall as Henry Goddard had just finished doing.
Terman was at first awed by Hall, from whose seminars "I always went home dazed and intoxicated, took a hot bath to quiet my nerves, then lay awake for hours rehearsing the clever things I should have said and did not."27 Under Hall's "hypnotic sway," Terman wrote a literature survey on precocity, and a follow-up questionnaire study to his master's research on leadership in children. Hall found these good enough to publish in his journals, and they became Terman's first professional publications.28
Terman was also an admirer of Binet, not just Galton. Typically, the histories of psychology present rather simple hierarchical structures of how ideas spread (typically from old bad person to modern hereditarian in a straight line), whereas cross-fertilization across egalitarian-hereditarian lines was often the case (Binet to Terman in this case).
Regarding the relationship between IQs and grades:
On the other hand, even the most enthusiastic supporters of IQ tests must acknowledge that the scores are far from perfect predictors. Assuming with Jensen that the true general correlation between IQ and school grades is about .60, children scoring at the 90th percentile on IQ tests will perform academically at an average of only the 77th percentile, with half falling below that figure. As was also shown by Terman's study of gifted children, a high IQ by itself offers no firm guarantee of success. Organizations such as "Mensa," which admit members solely on the basis of their high intelligence test scores, inevitably contain many people whose actual intellectual achievements fall substantially short of their IQ levels.†
Thus Terman's legacy of the IQ test has been a useful but imperfect gift. IQ scores can be important bits of information, but they must be interpreted and used with great caution. First, of course, it must be certain that the test was appropriate for the subject—that his or her cultural and environmental background is similar to those for which the test was developed and standardized. Granting this, it must be further recognized that the test's predictive value is only approximate, so that important decisions about an individual's life should always be supported by other sorts of information besides IQ. Finally, it must be recognized that a single IQ score can never be more than a global assessment of an "intelligence" that may well have many individually varying facets and complexities particularly if one accepts Binet's as opposed to Spearman's basic conception of intelligence.
*Of course, this suggests that some part of the unusual success achieved by Terman's gifted children may not have been due to their high IQs per se, but simply to their having been labeled as gifted. Such are typical of the ambiguities of research on the nature-nurture question.
†In a spoof of Mensa pretentiousness, an organization called "Densa" has recently been established in Toronto, with membership open to anyone of self-professed low intelligence willing to pay the $10 membership fee. Members receive lapel buttons with a turkey insignia, and promote the philosophy that since intelligent people have made such a mess of the world already, it is time to give stupid people a chance.
New online insult just dropped: Densa members.
Regarding twins, this amusing case study is given:
Ed and Fred had been born identical twins in a New England town, but had been adopted by different families when six months old. The two adopting families were of the same middle-class status, but did not know each other and each boy was raised as an only child. They attended the same school for a while, where their remarkable similarity of appearance was sometimes noted, but they did not become friends. While they were still very young, one family moved to Iowa and the other to Michigan, so they completely lost contact until Ed tracked down Fred.
Once reunited, they discovered that they had led very similar lives. Both had been mediocre students and had dropped out of high school; both had become electricians and worked for the telephone company; both had married and had a son at about the same time; and both even had a pet fox terrier named Trixie. Shortly after their reunion, the twins learned that three scientists at the University of Chicago were widely advertising for early-separated pairs of identical twins to visit Chicago and be studied, all expenses paid and at the time of the extremely popular 1933 World's Fair. Since their funds were scarce, the scientists required some advance assurance that applicants actually were identical twins. Only too happy to volunteer, Ed and Fred sent photographs proving their similarity of appearance, and were accepted for the study.
There are some other books that compiled many of these cases. Of course, there is a degree of cherry-picking in this approach, but the similarities are absurdly specific in ways that would be impossible to find without identical twins.
Regarding Jensen's method (correlated vectors) and Cyril Burt:
As in Spearman's study, all of the tests and ratings intercorrelated positively and arranged themselves in a hierarchy, though a much less perfect one than Spearman's. Thus Burt interpreted his findings as generally though not perfectly supportive of Spearman's theory of general intelligence.
*See Chapter 3, page 87 ff., for a description of Spearman's original study.
In one important way, Burt went far beyond Spearman in interpreting his data. Although he had no direct way of comparing the "general intelligence" of his two groups of subjects (since both had been ranked on intelligence only within themselves), he noted that the exclusive prep school boys scored higher than their ordinary counterparts on those tests which had achieved the highest average correlations, and were therefore presumably most highly saturated with general intelligence. From this, Burt concluded that the prep school boys had more general intelligence, and the question now arose as to why.
Jensen got the idea from a close reading of Spearman's 1927 book (The Abilities of Man), which noted (p. 379):
As typical of the research done along this path may be taken that of S. L. Pressey and Teter, who applied ten tests to 120 coloured American children of ages 10-14 and compared the results with those obtained from 2,000 white American children.! On the average of all the tests, the coloured were about two years behind the white; their inferiority extended through all ten tests, but it was most marked in just those which are known to be most saturated with g.
So Burt didn't go much further as such because he made the same observation, though about different groups.
Speaking of sex and ethnic differences:
The hard evidence for this position was slight, of course, since Burt had studied only forty-three individuals, and had never had a direct comparative measure of the general intelligence of his two groups. His faith in the major role of heredity here was in some ways surprising, because in other contexts the Burt of this period was quite sensitive to environmental factors. In 1912, for example, he surveyed the literature on sex and race differences in mental capacity. He concluded that sex differences in innate mental constitution were "astonishingly small—far smaller than common belief and common practice would lead us to expect."11 On race, he observed, "the differences . . . in innate mental capacities between civilised and uncivilised races, though characteristic, appear astonishingly slight. . . . In the case of the individual, we found the influence of heredity large and indisputable; in the case of race, small and controverted."12 This peculiar predisposition to insist upon the hereditary determination of intellectual differences among individuals, while accepting environmentalistic explanations for other important questions, persisted throughout Burt's life.
Burt was ahead of his time. In fact, so far ahead that he even was one of the first to adopt the modern mantra regarding heritability: within groups yes, but not between and deemphasize gaps. Maybe Harden etc. can take note of their eminent predecessor.
Burt is a problematic figure. Many people know about his suspected fraud, but this is all the more probable given his known personality issues:
Somewhat later Burt asked H. J. Eysenck, one of his best graduate students, to help with an article on factor analysis by calculating the statistics, while Burt wrote the text. Eysenck has reported:
Burt . . . showed me the paper he had written under our joint names, and I thought it was very good. I was rather surprised when it finally appeared in the British Journal of Educational Psychology in 1939 with only my name at the top, and with many changes in the text praising Cyril Burt.15
Following Spearman's death in 1945, Burt's campaign of self-aggrandizement intensified. He took great advantage of the fact that he was editor of the British Journal of Statistical Psychology, publishing many of his own unrefereed papers there, which inflated his own role in the history of factor analysis and minimized Spearman's. He also filled the journal with articles actually written by himself, but signed with fictitious names—such as a 1954 paean of praise to Cyril Burt by one "Jaques Lafitte," purportedly a French psychologist minutely familiar with the details of Burt's previous work.
The same behavior with papers written under other names was done by Roger Pearson in Mankind Quarterly ("Jamieson"). I assume this was because of a lack of submissions, so the pages had to be filled with something, and not look too egotistic. Pearson did not seem to do this for ego reasons.
By now, a few other workers in the field began to entertain some private doubts about certain aspects of Burt's studies. He had never presented detailed case studies of his subjects, as other investigators such as Newman, Freeman, and Holzinger had.
"George and Llewellen" were the only twin pair Burt or Conway ever described specifically; for all others, basic information regarding age, sex, or specific IQ scores was completely lacking. When other psychologists wrote to Burt asking for his raw data, they were usually politely but effectively put off with references to obscure documents from the 1910s and 1920s, or excuses regarding the unavailability or uncodeability of data. Finally, when the American sociologist Christopher Jencks requested simply a list of the fifty-three pairs of IQ scores, and the occupational ratings for the adoptive parents, Burt provided this bare-bones information—but only after a delay of several weeks. This represented the maximum detail with which he ever described his basic data.
At this point, a few British psychologists evidently realized that Burt had sometimes used fictitious authors' names for his own papers; "Jaques Lafitte" had seemed an improbable personage to some, and "J. Conway" was totally unknown to psychologists at University College London, the institutional affiliation given for her in her article. This did not seem a major sin, however, and since those investigators who had had difficulty obtaining raw data had not communicated among themselves to spread suspicion, no one during Burt's lifetime publicly voiced serious question about the legitimacy of his work. Thus when he died in 1971, a few private questions were being asked, but Sir Cyril Burt was still one of the most highly respected psychologists in the world.
You can see how it looks:
Improbably similar correlations across changing sample sizes
No study details
Authors and coauthors that don't exist (or maybe they do)
Fantastical results (from the view of egalitarians)
So maybe it wasn't real. Modern results have largely replicated his results, however, so if he was cheating, he apparently had a very good sense of what the results would be. Rushton notes that the average correlation of monozygotic twins apart in the non-Burt data is r = .75 and Burt's data is .77. There's also modern scholarship published after this book was published that throw some doubt on the accusations (Mackintosh 1995, 2013, Joynson 1989, Fletcher 1991, Rushton 1997, 2002). One of the reasons we can't get to the truth of the matter is that one of the Burt critics convinced Burt's secretary to burn his papers after his death!
Many of the details of the case are fascinating and disturbing. For example, there is the truly ‘‘flabbergasting’’ fact (Jensen’s, 1992, p. 105, term) that many of Burt’s papers were destroyed by Miss Gretl Archer, Burt’s private secretary for over 20 years, almost immediately after his death on the advice of Liam Hudson, professor of educational psychology at Edinburgh University, one of Burt’s most ardent opponents. Jensen’s account was corroborated by Hudson himself in an interview with Science staff writer Nicholas Wade (1976).
Regarding Arthur Jensen, multiple commentators have noted his open-mindedness and lack of allegiance to ideas. They think that if he had been presented with good evidence against hereditarianism, he would have accepted this. There is some historical evidence of this attitude:
A turning point came during Jensen's final student year when he read The Scientific Study of Personality by H. J. Eysenck, Burt's former student and by then a well-known British psychologist in his own right. Already on his way to becoming one of psychology's most prolific, iconoclastic, and controversial figures, Eysenck had written papers documenting the apparent ineffectiveness of psychoanalytically oriented psychotherapies, and had attacked psychoanalysis in his popular book Uses and Abuses of Psychology. Now advocating a quantitative and experimental approach to personality measurement which relied heavily on the factor analysis of test scores, Eysenck was generally contemptuous of "unscientific" psychoanalytic approaches, and had some critical comments to make about Symonds's work in The Scientific Study of Personality. Indeed, Jensen originally read the book because Symonds had asked him how he might respond to these criticisms. Ironically, Jensen found himself won over as "the quantitative and experimental approach to personality research espoused by Eysenck had much greater appeal to me, and seemed a much sounder basis for investigating human behavior than the more literary and speculative psychoanalytic variety."29 Jensen went on to read Eysenck's other works, and was so impressed that he applied to work in the Englishman's laboratory. He was accepted, and immediately after receiving his Ph.D. went to London on a two-year postdoctoral fellowship.
In addition, Jensen changed his mind on Burt (to thinking he was guilty), and maybe back again, so Jensen was an evidence enjoyer.
Moving on to Leon Kamin, who was the first to publicly question Burt's results:
Accordingly, when McCarthy came to Boston to hold new hearings in 1954, Kamin adopted a new and legally risky strategy. He now specifically waived the Fifth Amendment and testified fully under oath about his own involvement in the party. He declined to name the other people he had associated with, however, asserting, "I do not think that my duty to my country requires me to become a political informer." He added that he would be willing to name names only if convinced that espionage, sabotage, or treason had been involved. Predictably, McCarthy loudly doubted at the publicly broadcast hearing whether Kamin had ever really left the party, and demagogically blamed him and his "co-conspirators" for "the deaths of thousands of American boys" in the Korean War.51 Even more important, he filed charges of criminal contempt of Congress against Kamin for his refusal to answer all questions. The subsequent trial extended sporadically over many months, and concluded with the judge's decision that McCarthy's questions had exceeded his subcommittee's mandate from the Senate. Acquitted on these rather narrow technical grounds, Kamin was by now something of a national figure whose picture had appeared in the New York Times and other leading newspapers.52
I knew Kamin was a communist, prior member of the communist party even, but I didn't know he was literally a McCarthy target too. Here's Kamin on why he thought Burt's results were fishy:
Kamin saw at once upon reading Herrnstein's article that Burt's studies represented the crown jewels of the hereditarian case, and decided that any knowledgeable opinion would have to be based on a reading of Burt's original papers. He started with Burt's last and largest study of 1966, which reported the perfectly uncorrelated environments for fifty-three separated-twin pairs. Kamin was highly skeptical of this study at once, as he animatedly recalled some ten years after the event:
I think it is true to say that within ten minutes of starting to read Burt, I knew in my gut that something was so fishy here that it just had to be fake. He anticipates every possible objection to the hereditarian case, and comes out with a definitive empirical rebuttal to the objection. The work was so incredibly patly perfect and beyond cavil, and beyond challenge, that I just couldn't believe it. My experience of the messy nature of the real world was such that I just could not believe that what this guy was writing was true.
At the same time there was a kind of vagueness and ambiguity, and underdescription and underpresentation of method and detail. He didn't even name the IQ test used, no case histories, no information about the sex composition of the samples, or the times they were tested. So I was profoundly suspicious at once, and then started reading other Burt articles.
Finally, Arthur Jensen held (in 1980 at least) some surprisingly naive policy views on the use of intelligence testing:
The common practice of "streaming" schoolchildren into homogeneous IQ groups was convenient for teachers and administrators, Jensen adds, but his review of the relevant research indicated that this practice produced negative or at best ambiguous overall results for the children. Pupils in the fast streams occasionally fared slightly better on achievement tests than comparable children left in ordinary, "mixed" classes, but this gain was more than offset by unfavorable effects on children in the slower streams. Jensen concludes: "There is no compelling evidence that would justify ability grouping in the elementary grades. I believe that schools should aim to keep pupils of as wide a range of abilities as is feasible in regular classes."3 Grouping by ability or expertise is of course necessary in advanced subjects taught in high school and beyond, but here entrance can be determined by performance in prerequisite courses, achievement tests, or other concrete demonstrations of adequate preparation. IQ scores per se are once again beside the point.
The same basic points hold true for choosing job applicants. Jensen now urges that IQ tests be used cautiously, and only when there is a high demonstrated correlation between IQ and the post-training performance on the job in question. The list of such jobs is smaller than might be expected, because while IQ frequently correlates with quickness to learn, or with performance during a training period, it often bears little relationship at all to how well the job is performed after training. For occupations which require substantial prior academic training, performance in that training itself will be a more valid predictor than IQ.4
Since the book was published in 1985, it doesn't include any of the later people, so that's why there's no Richard Lynn, no Phil Rushton, no S. J. Gould etc. For those interested in a slightly later history, there's Defenders of the Truth: The Sociobiology Debate (2000) by Ullica Segerstråle, which has a broader scope since it covers those interested in 'sociobiology' which is about the same thing as evolutionary psychology and behavioral genetics. As far as I know, no one has written a detailed history book since then. There are a bunch of dumbed down woke histories, but these are generally of little interest (look up the books by Angela Saini, Adam Rutherford, Gavin Evans, and so on).
Emil, thank you for reviewing the history of intelligence studies. Very interesting to see the evolution of thought over the decades.
Really interesting. I learned a lot from this - which was not hard since I knew so little before reading it.