I’m an avid reader of Techmeme. Love the idea, UI, freshness, coverage, and most of all the quality of the articles.
When the Techmeme Leaderboard debuted earlier this month, lots of buzz circulated the blogosphere. Me, being a huge fan of partying on data, loved the concept, and wanted to take the analysis even further (Yuvi style, but with a search twist).
So yesterday I wrote up some code to crawl and analyze Techmeme articles over the whole year (Leaderboard shows the Top 50 sources for this month). I took a snapshot of Techmeme at 1:00PM every day between beginning January – end of September of 2007.
I computed basic statistics, like number of stories by author and source, as well as more involved measurements like the top word mentions of the year – in total and by category (used simple NLP to clean up the text and remove stopwords).
So, without further ado, here are the results:
Number of Stories by Author in 2007, Ranked
Number of Stories by Source in 2007, Ranked
Most Mentioned Words in 2007, Ranked
* words are stemmed
Most Mentioned Words, by Category, Trends in 2007, Ranked
Hope you guys find these results super interesting and useful.
One thought on “Techmeme Leaderboard 2007 – More!”
Awesome analysis 🙂 What NLP framework didja use? And language?
(I too have an analysis of Techmeme coming up (It was supposed to be up a month ago (but is not since, well, I’ve become lazy (and also because I had a vacation in between)))).