Recency bias in IMDB ratings
New movies aren't as great as they appear, they are probably worse actually
In 2024 I wrote a post showing this plot (well, a similar one):
The pattern is that newer movies have been getting progressively lower rated. Now there can be some reasons for this pattern:
Movies are just getting worse over time because of inherent lowering of movie production skills or cultural decline and ratings accurately reflect this.
Making movies used to be very expensive and most people couldn’t do it, that is, the barrier to entry was much higher, and because of this, only the relatively more plausible movie projects got filmed and the ratings reflect this.
Same as the above, but also between countries, where there are a lot of bad movies produced also in Bollywood, Nollywood and other places.
There is a lower coverage % of movies for older years, since few people bother adding some C-tier movie from 1930 to the IMDB database.
The above pattern is not due to short movies or porn, which while it exists in IMDB’s database, I have filtered them out. Here’s the decline pattern by major regions (IMDB dataset does not have a main production country or studio country, but has various indirect methods):
There is roughly a -0.01 decline most places, but no decline seen in Japan and South Korea. Whatever is responsble for this trend, it is nearly global. Nor can the decline be attributed entirely to the infusion of low quality 3rd world films, however, non-US movies pull the overall slope down by ~23% relative to the US-only slope, so that explanation was partially true. Insofar as Big Woke is destroying American cinema, it seems the exact opposite is the case with the global trends -- America is making the decline weaker.
Anyway, the point of this post was to look more closely at the apparent recent uptick in movie ratings. Again, this could reflect a genuine sudden increase in quality, but it is more likely to be a kind of movie rater selection bias, where people who are really excited about new movie X go to the cinemas or otherwise see it before other people do. Call it the fanboi effect. If we assume this uptick just reflects bias, we can estimate the effect size of this bias using linear regression with dummies for recent years with cross sectional data (1 time point). However, if we want to be sure it really is a temporary effect, we need data for the same movies over time. Sadly, the public IMDB datasets do not provide such longitudinal data, nor even a distribution of the ratings for a given movie (% of 10/10 etc.). What we can do instead is to ... download their rating file every day for some years and then analyze that data. This way, we can see how ratings change as movie get more ratings or age in general. Specifically, we can see this for all movie ages, as a movie made in 2000 was already 20+ years old and just got X years older in the period under study. Now maybe you don’t have patience to remember to download these files every month for years. But a computer does. So I had a script running on a server to download these files every day for the last 2 years or so. But it turns out I didn’t need bother with this because the Internet Archive covers the page since 2018. There isn’t data for every day, but there are some ~1200 files, of which I downloaded about 2 per month (sadly, this means my 2 year scraping project was in vain!). Using the within movie variation, we can estimate the recency bias quite accurately:



