Monthly Archives: July 2008

Yahoo! Boss – An Insider View

Disclaimer: This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Boss stands for Build your Own Search Service. The goal of Boss is to open up search to enable third parties to build incredibly useful and powerful search-based applications. Several months ago I pitched this idea to the executives on how Yahoo! can specifically open up its search assets to fragment the market. It’s remarkable to finally see some of the vision (with the help of many talented people) reach the public today.

Web search is a tough business to get into. $300+ Million capex, amazing talent, infrastructure, a prayer, etc. just to get close to basic parity. Only 3 companies have really pulled it off. However, I strongly believe we need to find innovative, incremental ways to spread the search love in order to encourage fragmentation and help promising companies get to basic parity instantly so that they can leverage their unique assets (new algorithm, user data, talent) to push their search solution beyond the current baseline.

Search is all about understanding the user’s intent. If we can nail the intent, then search is pretty much a solved problem. However, the current model of a single search box for everything loses an intent focus as it aims to cater to all people and queries. Albeit, a single search box definitely makes our lives easier, but I have a hard time believing this is the *right* approach.

In my online experience, I typically visit a variety of sites: Techmeme, Digg, Techcrunch, eBay, Amazon,, etc. While on these pages, something almost always catches my eye, and so I proceed to the search box in my browser to find out more on the web. Why do we have this disconnected experience? I think it’s because these sites do not provide web-level comprehensiveness. It’s unfortunate, because the page that I’m on may have additional information about my intent (maybe I’m logged in so it has my user info, or it’s a techy shopping site).

The biggest goal of Boss is to help bootstrap sites like these to get comprehensiveness and basic ranking for free, as well as offer tools to re-rank, blend, and overlay the results in a way that revolutionizes the search experience.

When I’m on, why can’t I search in their box, get relevant results at the top, and also have web results backfill below? I think users should be confident that if they searched in a search box on any page in the whole wide web that they’ll get results that are just as good as Yahoo/Google and only better.

The first milestone of Boss is a simple one: Make available a clean search API that turns off the traditional restrictions so that developers can totally control presentation, re-rank results, run an unlimited number of queries, and blend in external content all without having to include any Yahoo! attribution in the resulting product(s). Want to build the example above or put news search results on a map – go for it!

Here’s a link to the API:

Also, check out the Boss Mashup Framework:

The Boss Mashup Framework in my opinion makes the Boss Search API really useful. It lets developers use SQL like syntax for operating on heterogeneous web data sources. The idea came up as I was working on examples to showcase Boss, and realized the operations I was developing imperatively followed closely to declarative SQL like constructs. Since it’s a recent idea and implementation, there may be some bugs or weird designs lurking in there, but I strongly recommend playing around with it and viewing the examples included in the package. I’m biased of course but do think it’s a fun framework for remixing online data. One can rank web results by digg and youtube favorite counts, remove duplicates, and publish the results using a provided search results page template in less than 30 lines of code and without having to specify any parsing logic of the data sources/API’s as the framework can infer the structure and unify the data formats automatically in most cases.

The next couple of milestones for Boss I think are even more interesting and disruptive – server side services, monetization, blending ranking models, more features exposure, query classifiers, open source … so stay tuned.


Filed under Blog Stuff, Data Mining, Information Retrieval, Non-Technical-Read, Open, Search, Techmeme