My first politically motivated retraction

Mind you, the results were not in doubt

Mar 28, 2025

I finally managed to join the cool kids retraction club. The Scandinavian Journal of Psychology (SJoP) published a paper of ours about wokeness and mental health. I wrote about it last year.

Dutton, E., & Kirkegaard, E. Do conservatives really have an advantage in mental health? An examination of measurement invariance. Scandinavian Journal of Psychology.

Many studies have found that conservatives show an advantage in mental health and happiness and various causes of this have been debated (e.g., religiousness, ideology, or genetics). However, not much attention has been given to examining whether this advantage is psychometrically real, or whether it is due to test bias. We analyzed data from two large Finnish surveys of adults (Ns = 848 and 4,978) from Lahtinen (2024), that measured general anxiety and depression symptoms, as well as a new wokeness scale. Using differential item functioning tests, we found no evidence for measurement bias in these scales. The correlation between index scores of wokeness and mental health (internalizing) was −0.36, which increased to −0.41 when measurement error was removed. The association between wokeness and anxiety (r = −0.33, adjusted r = 0.37) was stronger than wokeness and depression (r = 0.20, adjusted r = 0.22).

Our study was a reanalysis of another paper published the same year in the same journal by Lahtinen. He made a new scale of social justice (wokeness) based on questions like "If white people have on average a higher level of income than black people, it is because of racism.". The original study did compute scores for the traits of interest (depression, anxiety, wokeness) and their correlations, but did not examine measurement invariance. Measurement invariance has been contested, as some people propose that measures of mental health work differently for those high and low in wokeness. It is also possible there is a reverse problem, namely, that people high and low in mental health fill out political questions differently in a way that violates measurement properties (that is, understand the same words differently). We tested for both of these possibilities, but did not find that they held up. Lahtinen had a large sample of the general population Finns, as well as a university-affiliated sample. In total nearly 6000 people.

Overall, the main result looks like this:

However, the journal got some negative emails they said and decided to do another round of ad hoc, post-publication reviews. Their email to us:

On Fri, 21 Feb at 11:32 AM , Wiley Research Integrity <researchintegrity@wiley.com> wrote:
Dear Dr. Dutton and Dr. Kirkegaard,
I am with Wiley’s Integrity Assurance and Case Resolution (IACR) office. I am contacting you because of concerns raised by third parties regarding your article “Do conservatives really have an advantage in mental health? An examination of measurement invariance,” Scandinavian Journal of Psychology 66 (76-84), 2025 (epub 24 August 2024), https://doi.org/10.1111/sjop.13065 . Wiley and the SJOP editors therefore investigated these concerns (case IACR-Inquiry-ID-1268), in accordance with Wiley’s Best Practice Guidelines and the guidelines of the Committee on Publication Ethics (COPE).
Our initial investigation concluded that the third-party concerns had merit. In cases such as this, it is appropriate to conduct post-publication peer review of the article. Two anonymous expert reviewers with no conflicts of interest were identified by the editors, and they provided independent feedback (see reviews pasted below).
Based on the results of the post-publication peer review, the editors have concluded that there are major errors in the article and that the article cannot stand in its current form. Because of the seriousness and extent of the errors, a simple correction of the published article would not be possible. Therefore, I am sorry to report that the article must be retracted, in accordance with Wiley's and COPE's standards.
However, the editors have also decided to offer you the option to revise and resubmit a new manuscript based on this study. (Please see the comments from the editors, below.) Such a new manuscript would have to address thoroughly the comments from the post-publication reviews below, and there can be no promise of acceptance prior to evaluation of a revised, resubmitted manuscript. Should you choose this option, you must include a list of point-by-point responses to the post-publication peer reviews, in order to facilitate rapid and fair evaluation.
The draft of the retraction statement that we will publish is as follows:
-=-=-
Retraction: Dutton E., Kirkeggard E. 2025. Do conservatives really have an advantage in mental health? An examination of measurement invariance. Scandinavian Journal of Psychology 66 (76-84)https://doi.org/10.1111/sjop.13065.
The above article, published online on 24 August 2024 in Wiley Online Library (wileyonlinelibrary.com), has been retracted by agreement between the journal Editor-in-Chief, Leif Edward Ottesen Kennair; The Scandinavian Psychological Associations; and John Wiley & Sons, Ltd. Following publication of this article, concerns were raised by third parties about the conclusions drawn by the authors based on the data provided. The publisher and the journal have investigated these concerns and have concluded that the article contains major errors involving methods, theory, and normatively biased language. These errors bring into doubt the conclusions drawn by the authors. Therefore, the parties agree that the article must be retracted.
-=-=-
We ask that you please respond by 04 March 2025, noting whether you agree or disagree with the retraction. We will note your agreement or disagreement at the end of the retraction statement when it is published.
I know that you will be disappointed by this news. However, while the retraction decision is final, the editors were clear about providing you with the opportunity to revise your work and re-submit it for competitive review.
We look forward to your reply regarding the journal’s proposal. In the meantime, we will proceed with the retraction process. If you do not reply by 04 March 2025, we will publish the retraction soon thereafter, noting that "the authors did not respond."
Sincerely,
Mark Paalman
Mark H. Paalman, PhD
Senior Manager
Integrity Assurance & Case Resolution | Wiley
Hoboken, NJ USA (GMT -5.0)
researchintegrity@wiley.com
Editor Comments:
After having received critical letters from third parties concerning your article entitled “Do conservatives really have an advantage in mental health? An examination of measurement invariance” we decided, after consulting with Wiley, that post-publication independent reviews were needed to evaluate the critique.
Two independent, well qualified reviewers agreed to perform these reviews. After taking the reviews into consideration, we decided that the article should be retracted, due to serious problems concerning methods, theory and normatively biased language. However, we find that it could be possible to rectify the problems and hence we decided to offer you the possibility to resubmit a revised manuscript where you respond to the two post-publication reviews.
Reviewer 1:
Review of “Do conservatives really have an advantage in mental health? An examination of measurement invariance” in Scandinavian Journal of Psychology
I have been asked to review a paper that has already been published in the Scandinavian Journal of Psychology titled “Do conservatives really have an advantage in mental health? An examination of measurement invariance”. The study addresses the association between political orientation and mental health outcomes, and in particular attempts to check for measurement invariance in psychopathology items across ideology. They find that “wokeness” is correlated at .33 with anxiety, and .20 with depression. The authors did not report any biases when using anxiety or depression scales in isolation, but they did find it when using a pooled internalizing scale perhaps because anxiety has a stronger relationship to “wokeness” than depression.
The research questions and methodological strategies strike me as interesting and sound. And the paper manages to avoid falling into the typical left-leaning political biases that can be problematic in political psychology (see Duarte et al., 2015), such as the view that conservatism is irrational or “motivated cognition”, or that authoritarianism do not exist on the left (“loch ness monster”), and so on. However, this paper is falling into the inverse trap, where conservatives are presumed to be symmetric, pro-social and physically and mentally strong individuals, whereby left-leaning individuals are pathologized with unnecessarily normative language. Even if it is the case that progressivism, or “leftism” is associated with symptoms of internalizing disorders (and this relationship has indeed been replicated), this warrants a balanced considerations of potential explanations of why this is so. This paper fails at this task. Another key problem with the paper is the sloppy use of the items in the original dataset and that they do not actually test the critical social justice scale. Lehtinen’s original items are clearly not meant to measure “leftism” generally. In fact, the original scale is explicitly an attempt to investigate if wokeness can be measured separately from the left-conservative axis. As Lahtinen (2024) writes (p. 703):
“The scale was strongly correlated with self-reported “wokeness,” indicating convergent validity. The scale also explained variance in self-reported “wokeness” unexplained by related concepts, left–right and liberal–conservative axes, indicating divergent validity.
Either the authors are unaware of this, or they deliberately (and misleadingly) did not use the scale that Lahtinen recommends. Either way, the authors should have known better or at the very least give a proper explanation of their odd choices.
Specific points:
The intro does a good job of referring to empirical findings that that back up the general positive association between political conservatism and mental health outcomes, such as depression. However, the authors claim that “Others have averred that conservatism seems to be part of a general evolved fitness factor which includes religiosity, facial symmetry, a strong immune system, pro-social personality, fertility, height and physical strength among other markers, meaning that we would expect deviation from it to be associated with partly genetically mediated mental illnesses such as depression and anxiety (Dutton, 2023; Sarraf, Woodley of Menie & Feltham, 2019)”. It is extremely unclear what «evolved fitness factor» is referring to in this context. Is it meant to suggest a positive manifold of genetic correlations between all these variables? If so, there should at the very least be references to studies showing this.
This quote is misleading because it seems to suggest that this conjecture is based on confirmed and substantive relationships between these variables, but then refers to two books. One is Dutton, 2023 — a book by the first author — and the other a book on «Modernity and cultural decline» that emphasize dysgenic explanations of “western decline”.
The authors use publicly available data from Lahtinen 2024 and claim to use 32 items that measures “leftism” from Lahtinens paper. However, the item pool in Lahtinen paper (table 1) show 26 items. This divergence is unaccounted for. Moreover, in the current paper they use all the items and call it “wokeness scale” or sometimes that it measures “leftism”. Why not use the scale with the 7 items that Lahtinen recommends in his paper? Dutton and Kirkegaard does not provide a justification for these odd psychometric choices.
Sometimes the Discussion reads as if it was discussing completely different results from the ones in question (p.7):
“A number of researchers (e.g., Dutton, 2023; Sarraf et al., 2019) aver that there is a general fitness factor of traits that were under positive selection under the harsh Darwinian conditions
prior to the Industrial Revolution and that these traits, thus, became pleiotropically related. Accordingly, with the heavy reduction of purifying selection since the Industrial Revolution, any deviation from this bundle – which includes conservatism, group-orientation, and traditional religiosity – should be associated with mutational load and, thus, poor fitness of which poor mental health would be an example. This model does not explain why anxiety is more strongly related to wokeness than depression, however. It can further be countered that religiosity is weakly negatively associated with intelligence and that we were under selection for intelligence (see Dutton & Woodley of Menie, 2018). However, this contradiction may be explained by evidence that intelligence is associated with environmental sensitivity and, thus, an ability to rise above cognitive biases, including religiosity (Dutton & Van der Linden, 2017), especially in an ecology of low mortality salience, as evidence indicates that mortality salience is a key factor in inducing religiosity (Norenzayan & Shariff, 2008). Moreover, there is evidence that intelligence is associated with social conformity – with superior norm-mapping and the effortful control necessary to force one’s self to conform to the dominant way of thinking to obtain the social benefits of so doing (Woodley of Menie & Dunkel, 2015) – meaning that potentially religious yet intelligent people may force themselves to become atheists in an increasingly secular society. Consistent with this environmental dimension, in the most recent samples there is no relationship between religiosity and IQ, a trickle effect has seemingly spread atheism to people of lower and lower IQ in Western countries (see Dutton & Van der Linden, 2017).
It is not clear what intelligence and self-forced atheism has got to do with anything, given the data that has been analyzed in this paper (except perhaps for referring to papers by the lead author).” Where are the comparisons and discussions of the current results to previous ones?
If behavioral genetic mechanisms are to be considered, what about linkage disequilibrium caused by assortative mating? Why is pleiotropy the only mechanisms? What about phenotypic causality? Are there any gene-environment correlations that are relevant? If cultural and technological development are important (e.g., the industrial revolution) what cross-cultural differences are predicted? I am not saying that any of the above are more or less plausible explanations (indeed think they all are distractions in this paper), but the authors do no justify why they favor only a particular explanation, and the Discussion as a whole is profoundly unbalanced.
Conclusion
Although I applaud the interesting research questions and methods (politics and mental health and testing for measurement invariance), I would not have recommended the paper for publication. This is primarily for two reasons: 1) because they are not using the items that make up the original scale, and instead used all (or more?) items from the original pool, and arbitrarily rebrand it as measuring leftism generally. 2) the discussion completely fails to provide a balanced view of possible explanations of the conservatism-internalizing relationship but instead focus on behavioral genetic concepts (very sloppily) and historical speculation such as the evolution of pleiotropic genetic effects between conservatism and positive mental health before the industrial revolution. The data and results do not allow for any light to be shed on these interpretations whatsoever.
References
Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. (2015). Political diversity will improve social psychological science. Behavioral and Brain Sciences, 38, e130, Article e130. https://doi.org/10.1017/S0140525X14000430
Lahtinen, O. (2024). Construction and validation of a scale for assessing critical social justice attitudes. Scandinavian journal of psychology, 65(4), 693-705. https://doi.org/https://doi.org/10.1111/sjop.13018
Reviewer 2
Review of Manuscript by Dutton and Kierkegaard
Thank you for inviting me to review this manuscript. After carefully analyzing the work, I have identified substantial issues spanning theoretical, methodological, and ethical domains, as well as the use of language in ways that undermine the scholarly neutrality required for publication. Below, I offer a detailed critique of the paper, addressing theoretical/conceptual shortcomings, methodological errors, and inappropriate or biased statements. These concerns raise serious questions about the validity and ethical standing of the manuscript.
Theoretical and Conceptual Problems
Conflation of Religiosity and Spirituality
The manuscript does not distinguish between the concepts of religiosity and spirituality. Religiosity typically refers to adherence to organized doctrines, institutions, and communal practices, whereas spirituality involves a personal and often non-institutional pursuit of meaning, purpose, or transcendence. These constructs are not interchangeable and have distinct relationships to political ideologies. While religiosity aligns with conservatism due to shared emphasis on tradition, social order, and adherence to structured norms (Graham, Haidt, & Nosek, 2009), spirituality is often associated with liberal ideologies, emphasizing individuality, autonomy, and openness to diverse experiences (Emmons, 1999, 2005; Emmons et al., 1998).
The manuscript attributes mental health advantages solely to religiosity, failing to consider how spirituality, common among liberal individuals, may contribute to well-being through practices such as mindfulness, meditation, or meaning-making. This oversight creates a misleading theoretical foundation, simplifying complex relationships between belief systems, political ideologies, and well-being.
Lack of Nuance in Describing Liberalism
The manuscript frames liberalism in reductive and biased terms, associating it primarily with traits such as “resentment, jealousy, fear, and disempowerment”. This depiction ignores core aspects of liberal ideologies, such as empathy, social justice, and advocacy for equality and human rights (Haidt, 2012). By failing to recognize these dimensions, the paper perpetuates stereotypes rather than presenting a balanced analysis of ideological orientations. This not only weakens the theoretical framework but also risks alienating readers and undermining the journal's reputation for objectivity.
Unsupported Biological Determinism
The manuscript posits that “mutational load” leads to liberalism via pleiotropic effects, linking it to traits such as neuroticism and reduced attractiveness. These claims are speculative and rely on untested assumptions about genetic influences on political ideology. Current literature on the genetics of ideology (e.g., Hatemi et al., 2014) suggests complex interactions between biology, environment, and personality, none of which supports the deterministic model presented in this manuscript. By presenting these assertions without evidence, the authors misrepresent the current state of knowledge in behavioral genetics and political psychology.
Methodological Issues
Misuse of Lahtinen's Scale
The manuscript purports to reanalyze data from Lahtinen (2024) but does not adhere to the validated scale structure. Lahtinen developed and validated a 7-item Critical Social Justice Attitudes Scale (CSJAS). However, Dutton and Kierkegaard claim to analyze a “32-item wokeness scale”, which is not recognized in Lahtinen’s final scale. Even if the items are part of the original pool of items, the manuscript does not provide any evidence that the additional items were psychometrically tested or relevant to the construct being measured (something Lathinen did and led him to omit some items and keep 7 of them). This is problematic, because the inclusion of unvalidated items violates the construct validity of the scale (Cronbach & Meehl, 1955).
This misalignment undermines the validity of the reanalysis and renders any conclusions about bias or ideological tendencies highly questionable.
Insufficient Reporting of Psychometric Analysis
The manuscript fails to provide critical details about the statistical analyses used, particularly the implementation of Item Response Theory (IRT) and Differential Item Functioning (DIF) tests. The paper, for example, lacks essential model fit indices (e.g., chi-square, RMSEA, CFI), which are necessary to evaluate the adequacy of the IRT model (Brown, 2006), and there is no discussion of anchor item selection, model assumptions, or robustness checks for DIF testing, raising doubts about the reliability of the results (Meade, 2010).
Without this information, it is impossible to verify whether the statistical methods were appropriately applied or if the findings are robust. That being said, which analyses were conducted is extremely vague and ambiguous.
Causal Overreach
The manuscript repeatedly makes causal claims that are unsupported by the data. For example: The authors assert that individuals with higher “mutational load” adopt liberal ideologies due to “feelings of disempowerment” and engage in behaviors such as virtue-signaling. This leap from correlation to causation ignores the multitude of environmental, cultural, and psychological factors that influence political orientation (Jost, Glaser, Kruglanski, & Sulloway, 2003).
Correlation-based analyses cannot substantiate these claims, and the authors fail to acknowledge these limitations. This oversight misrepresents the scope and implications of their findings.
Inappropriate and Biased Statements
Judgmental Language
The manuscript uses language that is inappropriate for academic writing, including:
- “Less attractive and higher in neuroticism” to describe liberals.
- “Creating a narcissistic morally superior self” to explain liberal coping mechanisms.
- “Playing for status covertly via virtue-signaling” as a behavior attributed to liberals.
These phrases are subjective, pejorative, and lack empirical support. Such language reflects ideological bias rather than objective scientific analysis, damaging the credibility of the work.
Imbalanced Framing
The manuscript portrays conservatism in exclusively positive terms (e.g., as aligned with sound mental health and religiosity) while casting liberalism as a result of personal dysfunction or maladaptation. For instance, the statement that liberals “oppose the structures of society” because they “symbolize power over them” oversimplifies the motivations behind progressive movements, reducing them to emotional reactions rather than principled advocacy for change.
This imbalance undermines the manuscript’s scholarly neutrality and risks framing it as ideological propaganda rather than scientific research.
Ethical Concerns
The authors seem to cite sources linked to white supremacy advocacy. This raises ethical concerns about the rigor of the manuscript’s literature review and its alignment with the journal’s standards for responsible scholarship. Such citations compromise the legitimacy of the work and reflect poorly on the journal’s vetting process.
Conclusion and Recommendations
This manuscript suffers from fundamental issues in its theoretical framework, methodological rigor, and scholarly objectivity. Specifically:
Theoretical Refinement: The authors must address the conflation of religiosity and spirituality and adopt a more balanced perspective on liberal and conservative ideologies.
Methodological Transparency: The authors must rectify the misuse of Lahtinen’s scale, provide robust psychometric analyses, and avoid unwarranted causal claims.
Objective and Ethical Scholarship: The authors must revise judgmental language, ensure balanced framing, and adhere to ethical standards in their citations.
Given these issues, the manuscript, in its current form, is not suitable for publication. Significant revisions, including collaboration with experts in psychometrics and political psychology, are necessary to address these concerns. Should the authors fail to address these issues, the journal may need to consider a retraction to maintain its scholarly integrity.
References
Brown, T. A. (2006). Confirmatory factor analysis for applied research. Guilford Press.
Emmons, R. A. (1999). The Psychology of Ultimate Concerns: Motivation and Spirituality in Personality. New York: Guilford Press.
Emmons, R. A. (2005). Striving for the Sacred: Personal Goals, Life Meaning, and Religion. Journal of Social Issues, 61(4), 731–745.
Emmons, R. A., Cheung, C., & Tehrani, K. (1998). Assessing Spirituality Through Personal Goals: Implications for Research on Religion and Subjective Well-Being. Social Indicators Research, 45(1-3), 391–422.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Graham, J., Haidt, J., & Nosek, B. A. (2009). Liberals and conservatives rely on different sets of moral foundations. Journal of Personality and Social Psychology, 96(5), 1029–1046.
Haidt, J. (2012). The righteous mind: Why good people are divided by politics and religion. Pantheon Books.
Hatemi, P. K., Medland, S. E., Klemmensen, R., Oskarsson, S., Littvay, L., Dawes, C. T., Verhulst, B., McDermott, R., Nørgaard, A. S., Klofstad, C. A., Christensen, K., Johannesson, M., Magnusson, P. K., Eaves, L. J., & Martin, N. G. (2014). Genetic influences on political ideologies: twin analyses of 19 measures of political ideologies from five democracies and genome-wide findings from three populations. Behavior genetics, 44(3), 282–294. https://doi.org/10.1007/s10519-014-9648-8.
Jost, J. T., Glaser, J., Kruglanski, A. W., & Sulloway, F. J. (2003). Political conservatism as motivated social cognition. Psychological Bulletin, 129(3), 339–375. https://doi.org/10.1037/0033-2909.129.3.339
Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95(4), 728–743. https://doi.org/10.1037/a0018966

Reading the reviews, there are three complaints:

They dislike the discussion section because it portrays conservatives too positively and leftists too negatively.
They noted a problem with the wokeness items coding, because we used 32 items, but the scale only has 26 items.
The description of the measurement invariance testing (differential item functioning) was not clear enough.

Let's begin with the latter. Our paper said:

In terms of measurement invariance, there are two ways this could manifest. First, it could be that wokeness affects the measurement of mental health. In other words, the mental health items do not function the same for subjects who are high vs. low in wokeness. Second, it could be that mental health affects the measurement of wokeness, such that the wokeness items do not function the same for people high vs. low in mental health. We refer to these two possibilities as mental health measurement invariance and wokeness measurement invariance, respectively. To test for both of these, we carried out an initial scoring of both scales employing all available items using the mirt package (Chalmers et al., 2020). The items were modeled as ordinal type using the item type = “graded.” The resulting scores were then saved, standardized, and a median split was made (high vs. low for health illness, and the same for wokeness). Based on these splits, we carried out differential item functioning (DIF) tests. Specifically, we used the scheme = “drop” argument in the DIF() function. In this approach, each item is tested in turn for bias (differences in all item parameters) using all of the other items as anchors.
After this is completed, the p values for bias are adjusted for multiple testing (Bonferroni), and the items with adjusted p < 0.05 are considered biased. To estimate the effect size of the bias, the score gaps for the groups are computed using the common, invariant model and the partially invariant model are compared. In the partially invariant model, the items found to have biased parameters are freed from the equality constraint and estimated within each group. The difference between the gaps is the bias effect size (Meade, 2010). The bias at the test or scale level is the sum of the bias of each item. Calculation of these effect sizes is done using the empirical_ES() function.

So our method section already noted how anchors were chosen (leave-one-out approach, one at a time). We in fact used the approach advocated by the creator of the mirt package (see here and here).

There was a problem with the coding of the wokeness items. The scale has 26 items, however, the two datasets (2 csv files), don't have 26 overlapping items. The first dataset has 26 items, and the second has 20 items, but some of the items in dataset 2 have different names as some items have an "n" added to them (4n, 5n 7n, 11n, 14n, 15n, 18n, 20n). If one looks at the codebook for the Lahtinen study, a few items are missing from the first dataset (3, 7, 10, 11, 12). The "n" is not explained anywhere nor are the omitted item descriptions. For this reason, I used the overlap of item names assuming these items were different in wording. This produced the joint set of 32 items instead of 26 items. After this problem was pointed out, I checked the wording for all the overlapping items in the codebook and they were identical, so I recoded the data to rename the items with the "n" added to make a total of 26 items and reran the code. This is the result afterwards:

Notice much? Well, a few points have moved around, and the correlation improved to 0.37 from 0.36. However, this was in fact due to a different problem not noticed by the reviewers or myself the first time. The items in dataset 1 used a 1-4 scale, but those in dataset 2 used a 1-5 scale. This means that people who said they slightly agreed with a statement in dataset 1 would be incorrectly modeled as saying they had no opinion either way when the dataset is merged and analyzed together. To resolve this problem, I recoded values 3-4 to 4-5, so that they match up. This is why a few points on the left side of the plot have moved further to the left, and that's why the correlation got a little stronger. To note, the problems with the 32 vs. 26 items only happens in the pooled data, and the analyses of the dataset separately were unaffected. In other words, the coding issue and its resolution caused a change of 0.01 in a correlation. Furthermore, since the issue is with the original study's codebook and data files, it is hard to blame us for this issue, however minor it is. (A few other results in the paper also used the pooled dataset and had minor changes too.) The interested reader can see the full R notebook here.

Finally, with regards to the wording, surely our paper could have been more balanced in the discussion section. However, I believe that science should be evaluated mainly on the data and methods, and less on whatever the authors put in the discussion and introduction sections. My guess is that this is the offending paragraph in our discussion section:

It can further be countered that religiosity is weakly negatively associated with intelligence and that we were under selection for intelligence (see Dutton & Woodley of Menie, 2018). However, this contradiction may be explained by evidence that intelligence is associated with environmental sensitivity and, thus, an ability to rise above cognitive biases, including religiosity (Dutton & Van der Linden, 2017), especially in an ecology of low mortality salience, as evidence indicates that mortality salience is a key factor in inducing religiosity (Norenzayan & Shariff, 2008). Moreover, there is evidence that intelligence is associated with social conformity – with superior norm-mapping and the effortful control necessary to force one’s self to conform to the dominant way of thinking to obtain the social benefits of so doing (Woodley of Menie & Dunkel, 2015) – meaning that potentially religious yet intelligent people may force themselves to become atheists in an increasingly secular society. Consistent with this environmental dimension, in the most recent samples there is no relationship between religiosity and IQ, a trickle effect has seemingly spread atheism to people of lower and lower IQ in Western countries (see Dutton & Van der Linden, 2017)

Frankly, I don't think these sections are important and I generally don't read them (or write them, Dutton wrote this). I think they should provide a space for authors to speculate on their interpretation of the findings as they see fit. After all, the results are there to see and the reader is free to interpret the findings in some other way. Surely, if our discussion section had emphasized left-wing friendly perspectives on the findings (say, how Trump's fascism is making leftists feeling worried and depressed), no one would have ever objected except for right-wingers on Twitter (and only maybe). As such, this presents an example of the bias in psychology where studies are scrutinized harder when they are more friendly to right-wing views.

With regards to the retraction, it makes the journal look bad. Our paper is the 9th most read paper in their journal of all time according to altmetrics (#9 of 832), and ranks 98th centile for attention compared to other research published the same time anywhere. This is also true for the study's page on ResearchGate. So by retracting it, they are lowering the impact of their own journal for non-scientific reasons, and giving another round of attention to the study. Naturally, I am happy they chose to give our work another round of publicity, even though it is regrettable they chose to engage in such unscientific practices.

For the record, this was our (my) reply to the editor:

Dear Mr. Paalman,
Thank you for letting us know of your intentions. It is regrettably that you have chosen this course of action. We note that this study is the 9th most popular study in your journal according to altmetrics (#9 of 832, https://wiley.altmetric.com/details/166670972). It is in fact sitting in the 98th centile for altmetrics compared to other research published the same year, something which is also true for the paper’s page on researchgate (https://www.researchgate.net/publication/383420384_Do_conservatives_really_have_an_advantage_in_mental_health_An_examination_of_measurement_invariance/stats).
However, it did present us with an opportunity to look into the coding issues. After checking, we found that the issues with coding the woke (social justice) items originates in the codebook for the original study (Lahtinen 2024). The two datasets contain 26 and 20 items. However, for codebook lacks documentation for items 3, 7, 10, 11, and 12 for dataset 1. No items are missing for the documentation for dataset 2. We put all the questions in a spreadsheet and found that the questions are exactly identical across datasets, and thus we can tentatively conclude that the items without double documentation are also identical in formulation. However, in the dataset 2, some items have an extra “n” added to the end of the variable names (4n, 5n 7n, 11n, 14n, 15n, 18n, 20n), and this made the datasets not merge entirely. This is why there were 32 items instead of 26 in the pooled dataset. There doesn’t seem to be any explanation for these names in the codebook, so we had originally assumed that these items used slightly different questions between the datasets and thus should not be merged. We also discovered an additional issue with the original study, namely that dataset 1 used a 1-4 scale, but dataset 2 used a 1-5 scale. That is, study 1’s items lacked a middle point (“not agree, not disagree”). This causes problems when the items are pooled, but not when the data are analyzed separately, which is also true for the first issue. To fix the second issue, we rescaled the items from dataset 1 so that option 3 (somewhat agree) is moved to 4 and 4 is moved to 5. After these changes, we reran all the code. These very minor changes were not expected to change results much and indeed they did not. Of the changes we found:
Figure 1 changes from r = 0.36 to 0.37.
Top right value in Table 1 changed from -0.12 to -0.16.
Very minor changes in the regression model (Figure 5), e.g. beta of wokeness from 0.28 to 0.29 in the full model.
So in general, the results were slightly stronger after the coding issues were fixed.
Our study already underwent regular peer-review, and we object to being subject to an additional round of hostile peer review. To our eyes, this comes off as an attempt to deliberately recruit hostile reviewers to axe the paper. Since the coding issues they considered important were in fact due to issues with the original documentation and files, it is hard to justify grave criticism of our study based on this.
With regards to the description of the methods that reviewer 2 mentions, we used the standard DIF practice suggested by the author of the IRT package we used (mirt). The specific method used was mentioned contrary to what the reviewer said (scheme = “drop”). Anchor items are chosen as described in our study and in further documentation from the author of mirt (Chalmers 2015a, 2015b). To be clear, in this approach, each item from the scale is set free (allowed to vary in slope and intercept) one at a time and any differences between the two groups’s item parameters are tested for significance. After this is done for all items, the model is refitted allowing for these items to vary between groups. The effect of this partial invariance is calculated on the scale level using the effect size approach as described by Meade (2010), also cited by reviewer 2. It is curious to note that the grave methodological issues that reviewer 2 thinks there are, were not detected by reviewer 1 who instead said that the methods were “interesting and sound”.
Reviewer 1 is unhappy that we conflated wokeness (social justicism) and general left-rightism. However, other research shows that these scales measure about the same thing, so there was no need to distinguish them statistically speaking, even if they are conceptually distinct. Reviewer 1 is right that this fact is not discussed as clearly in the discussion section as it could have been.
Studies like the present study suffer from a Catch-22 problem. If a discussion section is not long, reviewers will object and demand a longer discussion section. But if the authors write a discussion of the findings from their point of view, the discussion section is attacked for not being sufficiently left-wing (too right-wing). Our discussion could have been more balanced in the political tone, however, we note that a similarly left-wing friendly discussion section would not have resulted in any complaints. This is an example of the extra scrutiny that non-left-wing research is exposed to.
We generally believe that studies should be reviewed based on their methods, not based on their discussion sections. From our perspective, data and methods should be sound, but authors should have relatively free hands to discuss their own findings as they see fit. Since the methodological issues were minor and due to incomplete documentation and issues with the datasets that we had nothing to do with, and furthermore, when the method issues were addressed, this did not change the results in any notable way, we reject the proposal to retract the study. We think retracting it will put the journal in a poor light. Our only recourse will be to bring more attention to this unfair, politically motivated double peer-review.
Sincerely,
Emil Kirkegaard and Ed Dutton

And their reply:

Dear Drs Kirkegaard and Dutton,
We completely understand that you are upset about this decision. Our original intent behind re-evaluating the article was in keeping with COPE guidelines regarding third-party concerns raised which are considered appropriate to investigate. Neither the Journal nor Wiley pursue editorial evaluations and publication ethics investigations based on perceived political bias. The original peer review process was not as robust as it should have been, which is regrettable. However, for the re-evaluation of your published article, the editors made every effort to obtain two unbiased researchers. It was clear to us that their evaluations were robust and complementary.
I reiterate that the editors, out of fairness, are willing to consider for competitive peer review a new submission which addresses point-by-point all of the reviewers' comments, and revises the manuscript accordingly. Providing a point-by-point response to the comments would help to expedite evaluation. If you choose not to resubmit, we encourage you to pursue publication elsewhere.
We will publish the retraction shortly, noting that you disagree with our decision. We do hope to see you submit a reworked manuscript.
Sincerely,
Mark Paalman

This situation also gives us an opportunity to present a new study. As a replication, we carried out of a survey of 1000 fairly representative Americans on Prolific. These were given large numbers of questions about politics and mental health, including both positive aspects (life satisfaction, happiness), negative aspects (I feel stressed) including diagnoses ("I have a diagnosis for ADHD etc."). Stay tuned! I can unveil that our general results from prior studies replicate in that unusual physical presentation (unnatural hair color, tattoos), mental illness/neuroticism, and leftism all correlate as expected. Here's a sneak peak:

Just Emil Kirkegaard Things

Discussion about this post