Yahoo! Boss – An Insider View

Disclaimer: This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Boss stands for Build your Own Search Service. The goal of Boss is to open up search to enable third parties to build incredibly useful and powerful search-based applications. Several months ago I pitched this idea to the executives on how Yahoo! can specifically open up its search assets to fragment the market. It’s remarkable to finally see some of the vision (with the help of many talented people) reach the public today.

Web search is a tough business to get into. $300+ Million capex, amazing talent, infrastructure, a prayer, etc. just to get close to basic parity. Only 3 companies have really pulled it off. However, I strongly believe we need to find innovative, incremental ways to spread the search love in order to encourage fragmentation and help promising companies get to basic parity instantly so that they can leverage their unique assets (new algorithm, user data, talent) to push their search solution beyond the current baseline.

Search is all about understanding the user’s intent. If we can nail the intent, then search is pretty much a solved problem. However, the current model of a single search box for everything loses an intent focus as it aims to cater to all people and queries. Albeit, a single search box definitely makes our lives easier, but I have a hard time believing this is the *right* approach.

In my online experience, I typically visit a variety of sites: Techmeme, Digg, Techcrunch, eBay, Amazon, del.icio.us, etc. While on these pages, something almost always catches my eye, and so I proceed to the search box in my browser to find out more on the web. Why do we have this disconnected experience? I think it’s because these sites do not provide web-level comprehensiveness. It’s unfortunate, because the page that I’m on may have additional information about my intent (maybe I’m logged in so it has my user info, or it’s a techy shopping site).

The biggest goal of Boss is to help bootstrap sites like these to get comprehensiveness and basic ranking for free, as well as offer tools to re-rank, blend, and overlay the results in a way that revolutionizes the search experience.

When I’m on del.icio.us, why can’t I search in their box, get relevant del.icio.us results at the top, and also have web results backfill below? I think users should be confident that if they searched in a search box on any page in the whole wide web that they’ll get results that are just as good as Yahoo/Google and only better.

The first milestone of Boss is a simple one: Make available a clean search API that turns off the traditional restrictions so that developers can totally control presentation, re-rank results, run an unlimited number of queries, and blend in external content all without having to include any Yahoo! attribution in the resulting product(s). Want to build the example above or put news search results on a map – go for it!

Here’s a link to the API:

http://developer.yahoo.com/search/boss/

Also, check out the Boss Mashup Framework:

http://developer.yahoo.com/search/boss/mashup.html

The Boss Mashup Framework in my opinion makes the Boss Search API really useful. It lets developers use SQL like syntax for operating on heterogeneous web data sources. The idea came up as I was working on examples to showcase Boss, and realized the operations I was developing imperatively followed closely to declarative SQL like constructs. Since it’s a recent idea and implementation, there may be some bugs or weird designs lurking in there, but I strongly recommend playing around with it and viewing the examples included in the package. I’m biased of course but do think it’s a fun framework for remixing online data. One can rank web results by digg and youtube favorite counts, remove duplicates, and publish the results using a provided search results page template in less than 30 lines of code and without having to specify any parsing logic of the data sources/API’s as the framework can infer the structure and unify the data formats automatically in most cases.

The next couple of milestones for Boss I think are even more interesting and disruptive – server side services, monetization, blending ranking models, more features exposure, query classifiers, open source … so stay tuned.

46 thoughts on “Yahoo! Boss – An Insider View

  1. Wow, this is really cool. Thanks for the additional insight. It is great to see that there are nearly no restrictions for BOSS. I have to try it out now.

  2. This is a great announcement. Thanks!

    It seems that it is not possible to make a vertical search engine based on selected sites (something like Google CSE). Are you considering this as a next feature? If not, please do. 🙂

  3. You’re a rockstar, this is tremendous vision and really deserves to be congratulated! You may have changed the entire web by getting this implemented.

  4. thanks adrian – too kind. lots of people helped to make this happen. there’s plenty left to do – it’s the next couple of milestones that i think will really define open search.

  5. I agree, this is tremendous vision. I couldn’t agree with you more about user intent. I concur that this will change Search as we know it.

  6. @Sergio. I haven’t looked at the framework, or the API (yet!), so I’m really out on a limb here, but couldn’t you just include the site(s) names as parameters in the search string?
    BOSS – awesome concept, can’t wait to play with it. Thx all.

  7. @Sergio: in the query you can include site: terms to do restrictions. also stay tuned for an upcoming operator that should help you here.

  8. Great stuff zooie. Congratulations and thanks. My only fear is that if MS was going to acquire Yahoo it could kill off this wise and bold concept. How likely to happen do you consider this in case of a takeover? I guess this uncertainty doesn’t allow for the fastest and widest adoption possible.

  9. it’s a good question. don’t want to represent yahoo on this one but i’ll answer a part of it personally if that’s ok. i think once you go open (say boss not only provided web services but also the source code) it’s really hard to go back – whether a company later becomes a $100 billion dollars or part of microsoft. personally, i think going open can give a company a lot of leverage in these type of negotiations …

  10. Hi Vik,

    Why do we have a restriction of using a search box. I wanted to build a web site to help domaineers find valuable domain names out of thousands of domain names that expire daily based on certain factors most prominent being the search hits. I have hit a wall since none of the api’s from any search providers will let me do this in a script. This would result in probably max of 10,000 searches a day. All i care is what is the total number of search hits for a given keyword. Screen scraping is illegal and would result in banning of IP. There should be a way to allow using search api in scripts ( non web based) provided the api user advertises the BOSS api on the web site or any other condition which might be beneficial to both the api users and yahoo. Any Input?

    Thanks.

  11. Hello.

    Are you going to allow to search for multiple domains at the same time via BOSS? Something like (site:site1.com OR site:site2.com OR site:site3.com).

  12. Vik,

    Is is possible to search within sub-directories in multiple domains?

    for example: ?sites=disney.com/toys, cnn.com/news, garden.com/green/trees …

    I tried doing this with no success.

  13. one way to maybe approximate this is to use “inurl:” inside the query

    for ex.

    /web/v1/query+inurl:disney.com/toys+OR+inurl:cnn.com/news …

    you could also play around with using the sites parameter in conjunction with the inurl query syntax

    hope this helps
    and if you can, please post your questions to the ysearchboss yahoo group so others could also benefit from this discussion

    best

  14. I echo william, and I think growing that limit is very important to allowing people to create truly useful vertical search engines with BOSS.

  15. Vic,
    I tried to develop search results such as provided by kosmix.com via BOSS API and couldnt do this due to structure of search kosmix use. any suggestions???
    amit

  16. @amit

    To approximate kosmix results, you’ll probably need to issue multiple BOSS API calls doing site restricts on different domains (blogs, web, flickr, etc.). These can be loaded fast via multithreading/processing and/or AJAXed modules for each domain of results.

    Check out the “sites” parameter in BOSS for more details.

    You might also find the “keyterms” and “searchmonkey_rss” parameters very useful. More to come for BOSS so stay tuned.

  17. @William and @dinkler

    We hear you. Working on something for you guys. Check out search.techcrunch.com (which is powered by a future BOSS platform) for a pseudo sneak peak.

  18. From what I see, Yahoo Boss does not include a way to search for videos? Is this correct? Wouldn’t any search startup have to be able to offer video search in order to be competitive?

  19. @Shraga

    We’re looking into it. There are ways of doing video search via web search (&sites=youtube.com,metacafe.com) but agree a video search solution would be great.

    Appreciate the feedback.

  20. Vic,

    The big difference would be that the results would be text only, without thumbnails. Is this correct?

    Aside from that, I must say it’s a really great initiative!

  21. @Shraga

    Probably not but you could try using the &view=searchmonkey_rss feature to obtain structured abstracts if they are available (which might include thumbnail links).

    Best

  22. I want to search certain keywords within about 5000 domains using yahoo boss, and I do not want to get multiple results from the same domain, instead I want only one result from each website. So that I can use some custom reordering of the results. Can anybody suggest how can I achieve this? Actually I want to get the ranks of my listed domains with respect to certain keywords.

    1. This cannot be done in a single query with BOSS. Might be possible with many queries via individual or grouped site restricts (can do ~30 domains in a single query) with post-search filtering.

  23. I have one doubt. I am using BOSS simple query to find out results for a certain text. I am able to get only 50 records. How can i get more than 40 records?

    Thanks
    Harshit

    1. Increment the start parameter

      So ?start=0&count=50 gets you the first 50, and ?start=50&count=50 gets you the next 50

      50 is the max number of results you can fetch in a single call, but you can paginate with multiple calls via start until you hit 1000 results (if available for that particular query)

      Hope this helps

      — Vik

  24. I am in process of building an app with Yahoo BOSS for a company. But I am finding that the round trip time is around 250 to 300 milliseconds just for the api call. To fulfill my purpose for this app using Yahoo BOSS, I need at least 50 milliseconds or faster. Are there any tricks to make the search faster? I am only working with first 10 results.

Leave a reply to zooie Cancel reply