Discover more from Just Emil Kirkegaard Things
How to do an S factor analysis
John Fuerst suggested that I write a meta-analysis, review and methodology paper on the S factor. That seems like a decent idea once I get some more studies done (data are known to exist on France (another level), Japan (analysis done, writing pending), Denmark, Sweden and Turkey (reanalysis of Lynn's data done, but there is much more data).
However, before doing that it seems okay to post my check list here in case someone else is planning on doing a study.
A methodology paper is perhaps not too bad an idea. Here's a quick check list of what I usually do:
Find some country for which there exist administrative divisions that number preferably at least 10 and as many as possible.
Find cognitive data for these divisions. Usually this is only available for fairly large divisions, like states but may sometimes be available for smaller divisions. One can sometimes find real IQ test data, but usually one will have to rely on scholastic ability tests such as PISA. Often one will have to use a regional or national variant of this.
Find socioeconomic outcome data for these divisions. This can usually be found at some kind of official statistics bureau's website. These websites often have English language editions for non-English speaker countries. Sometimes they don't and one has to rely on clever use of guessing and Google Translate. If the country has a diverse ethnoracial demographic, obtain data for this as well. If possible, try to obtain data for multiple levels of administrative divisions and time periods so one can see changes over levels or time. Sometimes data will be available for a variety of years, so one can do a longitudinal study. Other times one will have to average all the years for each variable.
If there are lots of variables to choose from, then choose a diverse mix of variables. Avoid variables that are overly dependent on local natural environment, such as the presence of a large body of water.
Use the redundancy algorithm to remove the most redundant variables. I usually use a threshold of |.90|, such that if a pair of variables in the dataset correlate >= that level, then remove one of them. One can also average them if they are e.g. gendered versions, such as life expectancy or mean income by gender.
Use the mixedness algorithms to detect if any cases are structural outliers, i.e. that they don't fit the factor structure of the remaining cases. Create parallel datasets without the problematic cases.
Factor analyze the dataset with outliers with ordinary factor analysis (FA), rank order and robust FA. Use ordinary FA on the dataset without the structural outliers. Plot all the FA loading sets using the loadings plotter function. Make note of variables that change their loadings between analyses, and variables that load in unexpected ways.
Extract the S factors and examine their relationship to the ethnoracial variables and cognitive scores.
If the country has seen substantial immigration over the recent decades, it may be a good idea to regress out the effect of this demographic and examine the loadings.
Write up the results. Use lots of loading plots and scatter plots with names.
After you have written a draft, contact natives to get their opinion. Maybe you missed something important about the country. People who speak the local language are also useful when gathering data, but generally, you will have to do things yourself.
If I missed something, let me know.