Algorithmic Ineptitude

When I sat down to write this yesterday afternoon, two of the top five articles on Hacker News were not actually articles at all: the first pointed to a Microsoft page extolling the virtues of their new CEO Satya Nadella, below which the fifth “story” linked to Firefox 27’s release notes. In no universe would any person categorize either of these pages as something that “gratifies one’s intellectual curiosity.” In fact, I would say we can look forward to watching the former appear “on TV news” ad nauseam in the coming days and, perhaps, weeks as well. Nevertheless, both climbed out of obscurity and to the front page of this popular site despite violating the Hacker News submission guidelines not because someone decided Satya needed more publicity, or that Mozilla releasing the twenty-seventh iteration of the new Internet Explorer was in any way noteworthy, but because an algorithm put them there.

Once an article gets submitted to Hacker News, it goes to the “New” page where everything either fades to obscurity after as little as half an hour or gains popularity in the form of upvotes, comments, and clicks until eventually reaching the venerable front page. We may deduce this third criterion, clicks, by looking at Patrick McKenzie’s article Don’t End The Week With Nothing and comparing it to another, “DEA redacts tactic that’s more secret than parallel construction”: whereas the former came in at #2 with 99 points (upvotes) and 26 comments, the latter, despite ranking higher at #3, had 149 points and 54 comments. This seemingly off-kilter disparity has but one explanation: points and comments are not the only metric Hacker News uses to determine the rankings of its stories; even here, pageviews are king.

By engaging in this brief thought exercise, we are able to explain why websites antithetical to Hacker News’s professed target genres nevertheless go on to gain such popularity: users ignorant of the site’s guidelines submit such links, and then curious readers click them. As these items gain momentum and start climbing up the ranking ladder sheerly by virtue of their traffic, an unfortunate happenstance occurs by which readers, feeling in some way positively disposed towards Microsoft’s latest announcement, upvote the links in question and begin compounding the problem. When it peaks, this perfect storm leads to Microsoft’s latest CEO with his company profile on the front page of a high-traffic internet fire hose. Admittedly, this case is the exception rather than the rule: more often than not an actual article, something that truly deserves to be ranked #1 on Hacker News, occupies that spot; however, this sort of mishap happens often enough to prove that the current system by which links rise to fame is, quite frankly, broken.

Having knocked Hacker News down a few pegs on the notional “greatest site on the internet” chart and successfully removed some of the mystique surrounding the service in the process, let’s shift our focus to other, similar websites and take a critical look at them as well.

In many ways, Digg is very similar to Hacker News. In fact, remove the latter’s “New” page where submissions get a brief period of heightened discoverability, change the layout and color scheme a bit, and you have Digg. The site accepts user submissions, and then ranks the most popular articles on its front page according to Digg votes, Twitter links, and Facebook shares, according to the Digg FAQ. Similarly, Reddit also takes articles and stories submitted by its users and through a combination of upvotes and comments ranks them on the Reddit home page. Like Hacker News, both also put great stock in page views. For proof just look at the Reddit front page, where a story with just 131 comments and 2810 upvotes comes in ahead of another with 2444 comments and 3411 upvotes. A similar process of deduction can be applied to Digg’s homepage. On these sites as well, pageviews remain king.

At this point it’s worth noting that the only metric I have not discussed yet is time since submission. Each of the aforementioned portals tacks this number on to every story, so it stands to reason that this measure could also, in some way, affect rankings. However, with the same steps we used to find the elusive third criterion for popularity, we can also deduce that time since submission does not influence ranking directly. Indirectly these links obviously benefit from greater exposure and thus potentially more upvotes, comments, and clicks, but this number does not appear to play a role in rankings beyond that.

So we have three services with the same general purpose, going about it with roughly the same process. What makes Reddit a place for memes, Digg one geared towards primarily original work, and Hacker News the catch-all for anything remotely related to technology? Readerships. 2013 saw Reddit rack up approximately 60,916,667 pageviews a month, according to this year-in-review blog post. Although stats for Digg proved much harder to come by, even if I were to use numbers taken from peak popularity in January of 2012, when it attracted some 29.1 million monthly pageviews, Reddit’s incredible volume would still dwarf Digg’s relatively mediocre performance. Given its past success though, one might be tempted to place it just below Reddit on this three-site continuum; however, to do so would be to discount the significant traffic Hacker News receives every day.

According to a TechCrunch article from May of last year, Hacker News sees around 1.6 million pageviews every weekday. That number would only increase on weekends, but for simplicity’s sake assume it stayed the same; extrapolating to a full month, that means between forty-eight and forty-nine million pageviews over the same time period Digg, in its heyday as one of the largest, most popular websites on the internet, managed a “mere” twenty-nine million. Especially when considering Digg’s fall from grace over the last few years, even in light of its recent resurgence under Betaworks, and the fact that Hacker News has undoubtedly grown in the last year, this puts Digg in third place with Hacker News between it and Reddit.

These vast, mostly self-selecting readerships influence the types of content each site serves through the very methods that bring one post to fame while leaving another to wallow in obscurity. If Reddit has the greatest gross input based purely on raw numbers though, with Hacker News coming in second and Digg at the bottom, why does the quality of content each platform puts forth appear inversely proportional to the service’s rankings? It would stand to reason that a site as large as Reddit would benefit from the hive mind culling uninteresting and otherwise worthless work from its ranks, to put forth nothing but the best on its front page. Yet Reddit has become a place for memes, and Digg — with its significantly smaller readership — the place for interesting work. In order to answer that question we must dip briefly into the semantics of the English language, because the answer is, on one level, “audience”.

Reddit caters — and this is a very import turn of phrase — to the greatest common denominator of the internet with sensational headlines, semi-humorous memes, fail videos, and short articles generally of little substance. These are the media forms most internet users prefer, and so — with a huge number of normal internet users driving the content Reddit showcases — that is exactly what appears on Reddit.

Hacker News, as Leena Rao explains in the aforementioned TechCrunch piece, is a community created for and populated by “ex-Redditors” yearning for the good old days when most of Reddit’s users counted themselves amongst the ranks of hackers and innovators. Thus, Hacker News is a result of — again, an important distinction to compare with the way I defined Reddit above and will define Digg below — the greatest common denominator of the tech industry. To grossly oversimplify this comparison, you could think of Hacker News as r/technology in which only those familiar with or those who consider themselves familiar with technology can participate; a self-selecting subreddit devoted solely to technology and its surrounding industries. As such, Hacker News possesses both the good qualities of Reddit’s approach to promoting “interesting” articles, for a fair number of the links that make it to Hacker News are in some way interesting, and the bad as well, as evidenced by Microsoft’s prominence in the #1 spot despite contributing nothing to the surrounding narrative on the future of technology.

Digg, on the other hand, showcases1 the greatest common denominator in popular content, just as Reddit does, except Digg tempers these recommendations with a human element: they cull rankings from Twitter and Facebook as well as user interactions on the site, and use “moderators” — an ugly word they should have replaced with “editors” — to ensure that only quality work reaches the Digg front page. That human element is key, the single most important factor in curating good content.

Take Stefan Constantinescu’s new series in which he collects his favorite articles from the past day in a lengthy list of succinct summarizations, for example: this epitomizes exactly what I wish Hacker News would provide. Namely, great content aligned with my stated interests that I will not find any bloggers I follow talking about. Or consider Scott Hanselman’s Newsletter of Wonderful Things, Brett Terstra’s Web Excursions, or my own newsletter: each features hand-picked websites and links our subscribers can rest assured will fall within a certain realm of interest to them by virtue of the fact that we felt they were in some way noteworthy, and they agreed with our views enough to subscribe in the first place. This is the deficiency big-name curation sites suffer from: a lack of humanity to say that Satya Nadella’s corporate profile has no place on the front page of Hacker News, even if an algorithm says otherwise.

 To recap, Reddit caters to the greatest common denominator, Hacker News is a result of the greatest common denominator, and Digg showcases the greatest common denominator.

Permalink.