20 Comments
User's avatar
AP's avatar

I said "True, 50%" for all items where I had no idea and got 70% of those correct and therefore scored "-10.3 underconfident" on the whole, which suggests a bias in the test toward questions with true answers. A correct estimation would likely have scored me slightly overconfident.

AP's avatar

Actual numbers:

There were 56 questions where I thought "no idea" and selected True with 50% confidence. 68% of those were marked as correct.

If I had instead used False, 50% as my "no idea" answer then I would have gotten 32% correct and would likely have a different overall profile.

But it could have happened by chance.

Emil O. W. Kirkegaard's avatar

There are 65 true statement and 55 false, so the bias is slight. It was originally 50-50% but I replaced a lot of questions and this balance was forgotten.

Michael Watts's avatar

> There are 65 true statement and 55 false, so the bias is slight

Well, answers are 20% more likely to be true than they are to be false. That sounds like a fairly large bias to me.

Emil O. W. Kirkegaard's avatar

Clearly, a 2nd version of this test would swap a lot of questions (poor loadings, indeterminate, duplicate topic), but I doubt I would be able to get it to go viral again any time soon. If you want to try your hand at making a v2.0, let me know, and we can post it in a month's time or so.

AP's avatar

Thanks for checking. Oddly, I didn't get an email when you replied, so I didn't see this until now.

Rohan's avatar

Whenever I had no idea I put true 50% if the question number was odd and false 50% if the question number was even to make it random. I got 44% of those guesses right, which seems good

Michael Watts's avatar

I didn't assign an automatic true to 50% confidence answers. I answered true or false according to what I thought the best guess was, and gave 50% confidence to any question where I didn't think I had any argument pointing toward one side or the other. This made 50% by far my most commonly used bin (37/120), with 62% of the bin correct.

Boris Bartlog's avatar

I find both your and AP's high use of the 50% bin really alien.

You have no heuristic, knowledge, or basis for biasing a guess even slightly, to the point where 60% would be a reasonable estimate? I mean, as it happens: you do! Both you and AP got significantly more than 50% of these correct!

You just ... don't know that you do, or something.

I ended up using 50% as an estimate all of 7 times. And I still ended up scoring as 'underconfident'.

Michael Watts's avatar

> I mean, as it happens: you do! Both you and AP got significantly more than 50% of these correct!

> You just ... don't know that you do, or something.

Yes, I am unsurprised that I got significantly more than 50% of the 50% bin correct. I put answers there if I felt unable to defend them against a hypothetical contrary viewpoint.

That was the only bin that got a special interpretation from me, and it was my lowest-accuracy bin.

There is an issue in discussion of confidence-in-probabilistic-statements over the difference between "I am uncertain of this question, because the amount of information I have about it is low" versus "I believe that the likelihood of this question is near 50%, because I know a lot of information about it and that information tells me that the likelihood is near 50%". Both of those push your confidence toward 50% (for a question with two possible outcomes), but they mean very different things. What you're seeing is people using the 50% bin for questions where they are uncertain, which isn't a weird thing to do.

AP's avatar

68% of 56 is 38, or 10 more than 28, which is half of 56. Add that 10 back to my confidence score and I'm 0.3 questions underconfident. Dead-on, in other words. I was also close to dead-on with my confidence level in the other categories.

https://cdn.discordapp.com/attachments/1104847707123749057/1491419696891695104/Screenshot_20260405_081008_Chrome.jpg?ex=69d7a054&is=69d64ed4&hm=f26325c808808259961c99199ca16aa6e747560c682585c459814b89808138e3&

However, you're correct that I'm weird.

TonyZa's avatar

Our attention spans are shot, I feel 120 trivia questions are too many. And having 3 questions about the number of bones in the body is definitely too many.

Kveldred's avatar

>having 3 questions about the number of bones in the body is definitely too many.

Yeah, I thought the same, heh—though I may be a bit biased (have 0 interest in anatomy, so these sorts of questions are among those upon which I'm most likely to have no idea; only, like, TV or pop-culture questions would be worse–).

Tim's avatar

It would be interesting if you could capture more info about test takers. Eg see if men really are more overconfident than women, or young vs old.

Kveldred's avatar

>Sweden and Norway share the longest land border in Europe (0.08, 80.0%)

I think this actually *is* False—depending on interpretation, maybe. At least, from quick searches:

• "longest land border within Europe" → "The longest land border entirely within Europe is Norway–Sweden."

• "Russia–Ukraine border" → ~1,900km

• "Norway–Sweden border" → ~1,600km

So... the question appears to turn upon the question of "what counts as 'within Europe'", perhaps? (Although that's weird to me, because surely if we count Ukraine as European, then all of its border(s) ought count too, right? 🤷‍♂️)

Alternative explanations I have considered but tentatively rejected:

• "Coastline paradox"–type deal: seems unlikely to be significant enough to cough up an extra ~300km no matter at *what* resolution we measure (plus, legal borders probably aren't usually going to descend to an arbitrarily fine resolution, right?)...

• New Russian–Ukrainian border now shorter than previous: maybe, but... what *is* the new border?—the most plausible *de jure* border is probably still just the old one, and the *de facto* border is sort of in flux, innit?

William Bell's avatar

FYI, when I clicked the taketest link I got the following warning from safebrowse.io: "This site might compromise your device or contain high-risk content. To avoid these risks, we recommend avoiding this site."

Stonebatoni's avatar

My accuracy was generally ascending from confidence 60% -> 100% (60% confidence was 42% accurate up to 100% confident was 80% accurate).

Except my highest accuracy (by a decent margin!) was 50% confidence with 86% accuracy. Overall accuracy was only 66%.

Michael Watts's avatar

I found the histogram pretty interesting. It looks like sample sizes are too small for the histogram to be very informative.

My results are (50%: 37 answers, 62% correct) (60%: 21/ 71%) (70%: 15 / 100%) (80%: 10 / 90%) (90%: 21 / 95%) (100%: 16 / 100%).

Casting those percentages into absolute quantities, we see that in the 50% bin, I had 23 correct answers when I "should" have had 18 or 19, and this bin is something of a special case.

In the 60% bin, I had 15 right answers where I should have had 13.

In 70%, I had 15 right answers where I should have had 10 or 11.

In 80%, I had 9 right answers where I should have had 8.

In 90%, I had 20 right answers where I should have had 19.

And in 100%, I had 16 right answers where I should have had 16.

This looks like a monotonic increase from 62% to 100%, except that I got everything at the 70% level right, which really looks like it's a coincidence.

What bothers me more is that while "you estimated 80% confidence, but you were actually 90% accurate, one full bin underconfident" looks like a lot of underestimation, that's a difference of one question, the smallest difference you can have without being perfectly calibrated. I'm off by one question in the 80% and 90% bins and two questions in the 60% bin.

It is true that the error goes in the same direction in every bin, placing me at a reported 3rd percentile for confidence, but it just doesn't seem like the figures per bin can be very accurate.

As a matter of test design, you can sidestep the "what's the difference between 'true 50%' and 'false 50%'?" issue by presenting the items as statements and asking the testee to rate them from 0% to 100%. It would probably make sense to lead with an item that everyone knows is false, or an example demonstrating a rating of 0% for a false statement.

AP's avatar

Would love to see this correlated with autism quotient.

Henry Smith's avatar

You can reduce UX and time to completion by adding keyboard shortcuts. A/B for answers, numpad for confidence.