Category Archives: NLP

pplmatch – Find Like Minded People on LinkedIn

http://www.pplmatch.com

Just provide a link to a public LinkedIn profile and an email address and that’s it. The system will go find other folks on LinkedIn who best match that given profile and email back a summary of the results.

It leverages some very useful IR techniques along with a basic machine learned model to optimize the matching quality.

Some use cases:

  • If I provide a link to a star engineer, I can find a bunch of folks like that person to go try to recruit. One could also use LinkedIn / Google search to find people, but sometimes it can be difficult to formulate the right query and may be easier to just pivot off an ideal candidate.
  • I recently shared it with a colleague of mine who just graduated from college. He really wants to join a startup but doesn’t know of any (he just knows about the big companies like Microsoft, Google, Yahoo!, etc.). With this tool he found people who shared similar backgrounds and saw which small companies they work at.
  • Generally browsing the people graph based on credentials as opposed to relationships. It seems to be a fun way to find like minded people around the world and see where they ended up. I’ve recently been using it to find advisors and customers based on folks I admire.

Anyways, just a fun application I developed on the side. It’s not perfect by any means but I figured it’s worth sharing.

It’s pretty compute intensive, so if you want to try it send mail to [contact at pplmatch dot com] to get your email address added to the list. Also, do make sure that the profiles you supply expose lots of text publicly – the more text the better the results.

Advertisements

Leave a comment

Filed under AI, Blog Stuff, Computer Science, CS, Data Mining, Information Retrieval, Machine Learning, NLP, Research, Science, Search, Social, Uncategorized, Web2.0

Yahoo Boss – Google App Engine Integrated

Updated: I see blogs doing evaluations of the Q&A engine. I have to admit, that wasn’t my focus here. The service is merely 50 lines of code … just to demonstrate the integration of BMF and GAE.

Updated: Direct link to the example Question-Answering Service

Today I finally plugged-in the Yahoo Boss Mashup Framework into the Google App Engine environment. Google App Engine (GAE) provides a pretty sweet yet simple platform for executing Python applications on Google’s infrastructure. The Boss Mashup Framework (BMF) provides Python API’s for accessing Yahoo’s Search API’s as well remixing data a la SQL constructs. Running BMF on top of GAE is a seemingly natural progression, and quite arguably the easiest way to deploy Boss – so I spent today porting BMF to the GAE platform.

Here’s the full BMF-GAE integrated project source download.

There’s a README file included. Just unzip, put your appid’s in the config files, and you’re done. No setup or dependencies (easier than installing BMF standalone!). It’s a complete GAE project directory which includes a directory called yos which holds all the ported BMF code. Also made a number of improvements to the BMF code (SQL ‘where’ support, stopwords, yql.db refactoring, util & templates in yos namespace, yos.crawl.rest refactored & optimized, etc.).

The next natural thing to do is to develop a test application on top of this united framework. In the original BMF package, there’s an examples directory. In particular, ex6.py was able to answer some ‘when’ style questions. I simply wrapped that code as a function and referenced it as a GAE handler in main.py.

Here’s the ‘when’ q&a source code as a webpage (less than 25 lines).

The algorithm is quite easy – use the question as the search query and fetch 50 results via the Boss API. Count the dates that occur in the results’ abstracts, and simply return the most popular one.

For fun, following a similar pattern to the ‘when’ code, I developed another handler to answer ‘who’ or ‘what’ or ‘where’ style questions (finding the most popular capitalized phrase).

Here’s the complete example (just ~50 lines of code – bundled in project download):

Q&A Running Service Example

Keep in mind that this is just a quick proof of concept to hopefully showcase the power of BMF and the idea of Open Web Search.

If you’re interested in learning more about this Q&A system (or how to improve it), check out AskMSR – the original inspiration behind this example.

Also, shoutout to Sam for his very popular Yuil example, which is powered by BMF + GAE. The project download linked above is aimed to make it hopefully easier for people to build these types of web services.

34 Comments

Filed under Boss, Code, Computer Science, CS, Data Mining, Databases, Google, Information Retrieval, NLP, Research, Search, Yahoo

Techmeme Leaderboard 2007 – More!

I’m an avid reader of Techmeme. Love the idea, UI, freshness, coverage, and most of all the quality of the articles.

When the Techmeme Leaderboard debuted earlier this month, lots of buzz circulated the blogosphere. Me, being a huge fan of partying on data, loved the concept, and wanted to take the analysis even further (Yuvi style, but with a search twist).

So yesterday I wrote up some code to crawl and analyze Techmeme articles over the whole year (Leaderboard shows the Top 50 sources for this month). I took a snapshot of Techmeme at 1:00PM every day between beginning January – end of September of 2007.

I computed basic statistics, like number of stories by author and source, as well as more involved measurements like the top word mentions of the year – in total and by category (used simple NLP to clean up the text and remove stopwords).

So, without further ado, here are the results:

Number of Stories by Author in 2007, Ranked
Number of Stories by Source in 2007, Ranked
Most Mentioned Words in 2007, Ranked
* words are stemmed
Most Mentioned Words, by Category, Trends in 2007, Ranked

Hope you guys find these results super interesting and useful.

1 Comment

Filed under Blog Stuff, Data Mining, Information Retrieval, NLP, Statistics, Techmeme, Trends