Discover more from Just Emil Kirkegaard Things
Scoring my forecasts for 2021
Unofficial Metaculus superforecaster?
It is a test of true theories not only to account for but to predict phenomena. (William Whewell, 1840)
No one has done more in recent years to spread the idea that predictive accuracy is the most important trait of expertise and thus, scientific expertise, than Phil Tetlock. Tetlock has written 2 major works on the topic. The very much kick-ass 2006 book Expert political judgement which in general showed that people with fancy degrees and high opinions of themselves did not much better than chance at predicting many things. Rather, people with eclectic interests did the best, the so called foxes. Still, these people were not that good. Tetlock highlights the various problems with taking people at their word about their forecasting ability. These problems include 1) faulty memory, especially about remembering scores and forgetting errors, 2) vagueness of criterion for resolution allowing for some self-serving bias, and 3) hindsight bias, where you retroactively 'predict' things with confidence.
Later, Tetlock would team up with Dan Gardner to write the bestseller Superforecasting: The Art and Science of Prediction in 2015. This is a popular science book that presents a series of case studies about people who are particularly good at forecasting, as well as introducing the general topic. It's pretty light on the scientific details, but the curious reader will easily find these by looking at Tetlock's Google Scholar profile. The study found the same as the prior one namely that people with varied interests did the best in forecasting. Importantly, it also found that many non-experts non-scientists did very well on such tasks, and that some training on the task also helped.
All these things then suggest the question to the reader: so, am I actually good at predicting the future, or am I one of those people who merely think so, big ego and all? I was also wondering about this, so I started forecasting online like the subjects in Tetlock's book. However, I found that the GJ Open questions are very dull, so later I found an alternative site, Metaculus, which allows for user-generated questions. These can be both public (others can freely participate) or private (invitation only). Both are interesting. The first allows one to suggest many questions of interest to oneself that others will also try their hand at. From a commercial perspective, this essentially functions as a way to do free business intelligence on the condition that the questions are interesting enough that others will give it a try. The second allows one to try forecasting more personal things: Will I get a girlfriend? Will my mom remarry? Will I finish the education I am enrolled in?. I find it interesting to make a lot of these questions about myself, family and friends. What is forecasting ability worth if one cannot apply it to real life situations of personal importance?
Last year in January I blogged my predictions for 2021 year in terms of my own productivity in the broad sense. Now we can score all of them and get some objective data on my forecasting skills. The table below shows the 25th 50th 75th centiles of the probability distributions of the forecasts, and the true values.
Points won is the Metaculus point system, which you can find an explanation for here. Higher is better. My average is 36 from these questions. Is that good? We can look at the averages of the top forecasters on the site here. It shows the current top 50. I am on the 45th spot with 38.97 points per question on average. I am not sure if this includes points from only the public questions or also the private ones. In either case, it seems the difference is at most quite small for me. There are 185 users on this page's top list, I can't tell how many users there are in general, including the ones that don't do many predictions. The inclusion criteria for this list are not obvious, but looks like either at cutoff of at least 3500 points, or >=66 questions answered (I have relatively few, at 93). Whatever the exact case, I seem to now be an unofficial superforecaster on this site.
Going back to the specific forecasts above, we see that of the 7 questions, the 50% confidence interval (the range between 25th and 75th) included the true value 6/7 of the time. It is supposed to include the true value only 50% of the time. This means that my predictions are under-confident (or underprecise to use the strict terminology). My predictions were actually better than I thought they would be. I also obtained this result on the various online calibration tests I took back in 2015. It would seem I should some kind of reduction of my uncertainty after coming up with plausible values for this year's forecasts. (And then hope that I don't subconsciously counteract this!) There was once an interesting test that used the distances between US cities to gauge your ability to do correct confidence intervals. I can't find this test, though it is mentioned in my prior post. If anyone knows where this test this, please write it in the comments.