It is a long time ago since I did this project. I did not write about it here before but it is a pity since the results are thus not 'out there'. I put the project page here in 2012 (!). In short, I wrote python code to crawl Wikipedia lists. I figured out a way to decide whether a person was male or female. This was done using gendered pronouns which exist in English. I.e., the crawler fetches the full-text of the article, and counts "he", "his", "him", "she", "her". It assigns the gender with the most pronouns. This method seems rather reliable in my informal testing. I specifically wrote it to look at
Gender distribution of comedians over time
Gender distribution of comedians over time
Gender distribution of comedians over time
It is a long time ago since I did this project. I did not write about it here before but it is a pity since the results are thus not 'out there'. I put the project page here in 2012 (!). In short, I wrote python code to crawl Wikipedia lists. I figured out a way to decide whether a person was male or female. This was done using gendered pronouns which exist in English. I.e., the crawler fetches the full-text of the article, and counts "he", "his", "him", "she", "her". It assigns the gender with the most pronouns. This method seems rather reliable in my informal testing. I specifically wrote it to look at