Hacker News metrics (first rough approach)
I'm not a huge fan of Hacker News. My impression continues to be that it ends up promoting stories that align with the Silicon Valley narrative of meritocracy, technology will fix everything, regulation is the cancer killing agile startups, and discouraging stories that suggest that the world of technology is, broadly speaking, awful and we should all be ashamed of ourselves.
But as a good data-driven person, wouldn't it be nice to have numbers rather than just handwaving? In the absence of a good public dataset, I scraped Hacker Slide to get just over two months of data in the form of hourly snapshots of stories, their age, their score and their position. I then applied a trivial test:
- If the story is younger than any other story
- and the story has a higher score than that other story
- and the story has a worse ranking than that other story
- and at least one of these two stories is on the front page
(note: "penalised" can have several meanings. It may be due to explicit flagging, or it may be due to an automated system deciding that the story is controversial or appears to be supported by a voting ring. There may be other reasons. I haven't attempted to separate them, because for my purposes it doesn't matter. The algorithm is discussed here.)
Now, ideally I'd classify my dataset based on manual analysis and classification of stories, but I'm lazy (see ) and so just tried some keyword analysis:
A few things to note:
- Lots of stories are penalised. Of the front page stories in my dataset, I count 3240 stories that have some kind of penalty applied, against 2848 that don't. The default seems to be that some kind of detection will kick in.
- Stories containing keywords that suggest they refer to issues around social justice appear more likely to be penalised than stories that refer to technical matters
- There are other topics that are also disproportionately likely to be penalised. That's interesting, but not really relevant - I'm not necessarily arguing that social issues are penalised out of an active desire to make them go away, merely that the existing ranking system tends to result in it happening anyway.
This clearly isn't an especially rigorous analysis, and in future I hope to do a better job. But for now the evidence appears consistent with my innate prejudice - the Hacker News ranking algorithm tends to penalise stories that address social issues. An interesting next step would be to attempt to infer whether the reasons for the penalties are similar between different categories of penalised stories, but I'm not sure how practical that is with the publicly available data.
(Raw data is here, penalised stories are here, unpenalised stories are here)
 Moving to San Francisco has resulted in it making more sense, but really that just makes me even more depressed.
 Ha ha like fuck my PhD's in biology
 Perhaps stories about startups tend to get penalised because of voter ring detection from people trying to promote their startup, while stories about social issues tend to get penalised because of controversy detection?