By Nilkanth Patel
Over the last decade, The New York Times has served as a trailblazer for innovation in journalism. Many of their projects are imagined for the sole purpose of pushing the envelope—seeing how reporting can be enhanced when it’s paired with cutting edge technology and stunning visuals. So how exactly has that experimentation leaked into their editorial decision-making? How has one age-old publishing process been re-imagined with the intervention of advanced technology?
The New York Times’ Article Search API makes it possible to begin answering that question—there is plenty to be learned in The Grey Lady’s choice of words, especially when it comes to headlines. We analyzed differences in the past ten years’ worth of print, web, and even SEO-driven headlines from the New York Times. The results, perhaps unsurprisingly, revealed little effort to lure in millennials with click-bait. In fact, the words used the most in print versus web headlines were remarkably similar. Taking every headline made available through the API and subtracting common “stop words” from them, we counted which words were used the most. Only a few proper nouns—most notably “U.S.” and “Obama”—make frequent appearances in both lists. SEO Headlines, on the other hand, were driven by state names and sections that appear in the newspaper, presumably to bolster their relevance in searches that mention these topics.
An unexpected favorite for SEO headlines is the term “weddings.” It appears in the top ten most frequent words list for four of the ten years in our data for SEO headlines, but never appears in the top thirty for either print or web ones for the years we analyzed.
The length of headlines between print and web has, surprisingly, been relatively similar, with web headlines often being longer than print ones. What’s interesting, however, is that SEO headlines are often significantly longer than their web or print counterparts. And since 2006, the gap between print and SEO headlines has been widening.
The Number of Words in Headlines, by Year
The goal for SEO headlines is, of course, to maximize visibility, and promoting the principals in every story seems to be the best way to do that. While the variance in distribution between terms used in the different headlines may not be outwardly interesting, it tells the story of an organization more and more concerned with its digital presence and, more importantly, its reachability. The longer the SEO headline, the more likely one of the terms will score higher in Google’s PageRank algorithm. That’s the thinking, at least. Whether or not that actually works as they expect remains to be seen. And perhaps more interesting is whether or not improved position has any impact on the paper’s overall successes. Until we have an answer, all we can do is keep watching—and trust us, we will.
Nilkanth Patel is a data journalist from Rochelle Park, NJ. He enjoys programming and Sunday mornings (but not both at once). Follow him at @nilkanthjp.