Twitter + BOSS = Real Time Search

Try ityahoo

Update: (6/25) This application has been updated. Go here to learn more. The description below though still applies.

Update: (6/11) In case you’re bored, here’s a discussion we had with Google and Twitter about Open & Real-time Search.

Update: (1/19) If you have issues try again in 5-10 minutes. You can also check out the screenshots below. (1/15) App Engine limits were reached (and fast). Appreciate the love and my apologies for not fully anticipating that. Google was nice enough though to temporarily raise the quota for this application. Anyways, this was more to show a cool BOSS developer example using code libraries I released earlier, but there might be more here. Stay tuned.

Here’s a screenshot as well (which should hopefully be stale by the time you read this).

Basically this service boosts Yahoo’s freshest news search results (which typically don’t have much relevance since they are ordered by timestamp and that’s it) based on how similar they are to the emerging topics found on Twitter for the same query (hence using Twitter to determine authority for content that don’t yet have links because they are so fresh). It also overlays related tweets via an AJAX expando button (big thanks to Greg Walloch at Yahoo! for the design) under results if they exist. A nice added feature to the overlay functionality is near-duplicate removal to ensure message threads on any given result provide as much comment diversity as possible.

Freshness (especially in the context of search) is a challenging problem. Traditional PageRank style algorithms don’t really work here as it takes time for a fresh URL to garner enough links to beat an older high ranking URL. One approach is to use cluster sizes as a feature for measuring the popularity of a story (i.e. Google News). Although quite effective IMO this may not be fast enough all the time. For the cluster size to grow requires other sources to write about the same story. Traditional media can be slow however, especially on local topics. I remember when I saw breaking Twitter messages describing the California Wildfires. When I searched Google/Yahoo/Microsoft right at that moment I barely got anything (< 5 results spanning 3 search results pages). I had a similar episode when I searched on the Mumbai attacks. Specifically, the Twitter messages were providing incredible focus on the important subtopics that had yet to become popular in the traditional media and news search worlds. What I found most interesting in both of these cases was that news articles did exist on these topics, but just weren’t valued highly enough yet or not focusing on the right stories (as the majority of tweets were). So why not just do that? Order these fresh news articles (which mostly provide authority and in-depth coverage) based on the number of related fresh tweets as well as show the tweets under each. That’s this service.

To illustrate the need, here’s a quick before and after shot. I searched for ‘nba’ using Yahoo’s news search ordered by latest results (first image). Very fresh (within a minute) but subpar quality. The first result talks about teams that are in a different league of basketball than the NBA. However, search for ‘nba’ on TweetNews (second image) and you get the Kings/Warriors triple OT game highlight which was buzzing more in Twitter at that minute.

'NBA' on Y! News latest
'NBA' on Y! News latest
'NBA' on Y! News latest enhanced by Twitter
'NBA' on TweetNews

There’s something very interesting here … Twitter as a ranking signal for search freshness may prove to be very useful if constructed properly. Definitely deserves more exploration – hence this service, which took < 100 lines of code to represent all the search logic thanks to Yahoo! BOSS, Twitter’s API, and the BOSS Mashup Framework.

To sum up, the contributions of this service are: (1) Real-time search + freshness (2) Stitching social commentary to authoritative sources of information (3) Another (hopefully cool) BOSS example.

The code is packaged for general open consumption and has been ported to run on App Engine (which powers this service actually). You can download all the source here.

99 thoughts on “Twitter + BOSS = Real Time Search

  1. Vik,

    I had a similar idea last couple of days( I know its weird coincidence!). Here’s the mashup i wroteup last couple of days.

    I started off developing a more personalized search experience to search through one’s personal networks. But due to authentication issues for now I have restricted the app to search through Yahoo! Mail ( need a premium account ) and a few other services like twitter/Yahoo! News and Flickr.

    BOSS was the next one I was thinking of integrating.

    For others who want to play with my UI here it is ( if it looks crappy bear with me:p I am no UI designer! )

    It will require your permissions to access Yahoo! Mail for now it asks for read/write permissions ( though it doesnt write at all). I will change the app key with one which requires only read permissions soon.

    http://bhasker.net/betterthanCandygram/

  2. Vik, awesome app. I just pinged you on email, but we built a very similar demo on top of BOSS about a month ago: http://boss.postrank.com/?q=yahoo&type=news

    It’s using our postrank api, which takes into account twitter + many other sources. I’d love to chat with you about integrating some additional metrics into your app + any other ideas you have in the space.

    Love the demos you’ve been building, keep them coming! 😉

  3. WoW really cool idea – utilize things being shared on twitter at current time to display the most relevant/current results on a given search topic.

    Type in Hudson plane crash and first link that appears is the most popular link being shared thru twitter which would appear as first search result.

  4. I’ve been looking for a way to do this at NewsChomper.com – right now we are using RSS feeds to display breaking news from multiple outlets – but it is only breaking news in so far as the RSS originator site breaks it. As we all know, a lot of news outlets are slow off the mark when it comes to real breaking news. You are pushing in the right direction Vik!

  5. I couldn’t get the code to work with the App Engine SDK. Oddly enough, it worked on the production App Engine.

    It looks like the Yahoo API doesn’t like the HTML header ‘Accept-encoding: identity’. Even though the Yahoo python library changes that to ‘Accept-encoding: gzip’, what is getting sent out on the wire is ‘Accept-encoding: identity’.

    I dug into the App Engine SDK source, and the problem is that ‘accept-encoding’ is listed in _UNTRUSTED_REQUEST_HEADERS in /usr/local/google_appengine/google/appengine/api/urlfetch_stub.py — commenting that out made it possible to run the code on the SDK.

    I’ll create an issue over at Google for this.

    1. BOSS = “Build your Own Search Service”
      local vertical search isn’t supported yet but some of it can be pulled in from news and web using the right query rewrite

  6. Definitely better than Twitter search by itself, because there’s often no explanation. What on earth is Pedamundo? (actually you don’t have anything on it yet. It’s a holiday invented by John Mayer.)

    But your talk with Google & Twitter video isn’t up yet, or at least not at that URL. I’ll check back, because I’m very curious!

  7. Great idea. I think it’s so funny how in just a liitle over a year Twitter has been able to replace most news sources. I mean, it seems like a huge shift in the internet that e were all lucky enough to witness.

  8. The focus on search tools and analytics are deeply missed with Twitter. I totally agree with you in that Twitter’s content is the freshest available.

  9. Twitter news is most likely the yahoo. But it brings news faster than the yahoo and Google. I think twitter news is same to the yahoo. But it is better to search the news himself on the Google or the news channel as it provide the complete information and images but twitter can’t show images or provide the complete news.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s