Google Co-op just got del.icio.us!

Update: Sorry, link is going up and down. Worth trying, but will try to find a more stable option when time cycles free up.

This past week I decided to cook up a service (link in bold near the middle of this post) I feel will greatly assist users in developing advanced Google Custom Search Engines (CSE’s). I read through the Co-op discussion posts, digg/blog comments, reviews, emails, etc. and learned many of our users are fascinated by the refinements feature – in particular, building search engines that produce results like this:

‘linear regression” on my Machine Learning Search Engine

… but unfortunately, many do not know how to do this nor understand/want to hack up the XML. Additionally, I think it’s fair to say many users interested in building advanced CSE’s have already done similar site tagging/bookmarking through services like del.icio.us. del.icio.us really is great. Here are a couple of reasons why people should (and do) use del.icio.us:

  • It’s simple and clean
  • You can multi-tag a site quickly (comma separated field; don’t have to keep reopening the bookmarklet like with Google’s)
  • You can create new tags on the fly (don’t choose the labels from a fixed drop-down like with Google’s)
  • The bookmarklet provides auto-complete tag suggestions; shows you the popular tags others have used for that current site
  • Can have bundles (two level tag hierarchies)
  • Can see who else has bookmarked the site (can also view their comments); builds a user community
  • Generates a public page serving all your bookmarks

Understandably, we received several requests to support del.icio.us bookmark importing. My part-time role with Google just ended last Friday, so, as a non-Googler, I decided to build this project. Initially, I was planning to write a simple service to convert del.icio.us bookmarks into CSE annotations – and that’s it – but realized, as I learned more about del.icio.us, that there were several additional features I could develop that would make our users’ lives even easier. Instead of just generating the annotations, I decided to also generate the CSE contexts as well.

Ok, enough talk, here’s the final product:
http://basundi.com:8000/login.html

If you don’t have a del.icio.us account, and just want to see how it works, then shoot me an email (check the bottom of the Bio page) and I’ll send you a dummy account to play with (can’t publicize it or else people might spam it or change the password).

Here’s a quick feature list:

  • Can build a full search engine (like the machine learning one above) in two steps, without having to edit any XML, and in less than two minutes
  • Auto-generates the CSE annotations XML from your del.icio.us bookmarks and tags
  • Provides an option to auto-generate CSE annotations just for del.icio.us bookmarks that have a particular tag
  • Provides an option to Auto-calculate each annotation’s boost score (log normalizes over the max # of Others per bookmark)
  • Provides an option to Auto-expand links (appends a wildcard * to any links that point to a directory)
  • Auto-generates the CSE context XML
  • Auto-generates facet titles
  • Since there’s a four facet by five labels restriction (that’s the max that one can fit in the refinements display on the search results page), I provide two options for automatic facet/refinement generation:
    • The first uses a machine learning algorithm to find the four most frequent disjoint 5-item-sets (based on the # of del.icio.us tag co-occurrences; it then does query-expansion over the tag sets to determine good facet titles)
    • The other option returns the user’s most popular del.ico.us bundles and corresponding tags
    • Any refinements that do not make it in the top 4 facets are dumped in a fifth facet in order of popularity. If you don’t understand this then don’t worry, you don’t need to! The point is all of this is automated for you (just use the default Cluster option). If you want control over which refinements/facets get displayed, then just choose Bundle.
  • Provides help documentation links at key steps
  • And best of all … You don’t need to understand the advanced options of Google CSE/Co-op to build an advanced CSE! This seriously does all the hard, tedious work for you!

In my opinion, there’s no question that this is the easiest way to make a fancy search engine. If I make any future examples I’m using this – I can simply use del.icio.us, sign-in to this service, and voila I have a search engine with facets and multi-label support.


Please note that this tool is not officially endorsed by nor affiliated with Google or Yahoo! It was just something I wanted to work on for fun that I think will benefit many users (including myself). Also, send your feedback/issues/bugs to me or post them on this blog.

74 thoughts on “Google Co-op just got del.icio.us!

  1. Yea I saw that when researching this. The main difference is the tool that I provide actually works 🙂 This one looks like a UI prototype or something – try a bad delicious login and it works, it doesn’t even ask for a CSE ID (how does it know where to upload to?), nor does it show you the annotations output. My tool gives you the output for the annotations, plus uses some AI tricks to also autogenerate the context.

  2. Thanks for the bug Riza. Try the link now it should work.
    * I wasn’t UTF-8 encoding the tag strings in my title generation step. Haven’t seen tag names like that before but hey you have every right to 🙂 Thanks again.

  3. Hi Babs – Technically it does get captured (GET request logs) since those values are passed as CGI parameters to the URL. I will not use/sell them. I plan to delete the logs periodically.

    It’s tough to provide full encryption since I don’t have a SSL certificate. In the meantime, if you’re concerned about security/privacy, why not go to del.icio.us, change your password to some temporary value, run this wizard, then change your password back? The wizard takes like a minute to do. Later today I will (finally) encrypt passwords in all URL requests to provide a decent level of security.

    I’m also wondering if I should release this as a desktop application. Would people prefer this?

  4. Please do me one favour, please delete my second comment and this one too.

    I think u knew why 😉

    Note:
    I’ve to take at least one minute to check it before posting a comment cos no editing here . One new resolution for this year to follow. 🙂

  5. Hi Riza – I deleted your comment with the link that exposed your password. I’m keeping (and replying) to your latter comments so users know that they can directly email me such links in the future (check bottom of bio for contact information).

  6. Thanks Singh, here’s one more bug (I hope so ;-)).

    Well, as u knew my password got exposed in last post, so i changed my password to a strong one (which includes $ * and alpha numeric)

    Here is the bug, my new password shows the err

    GetXmlResponse Error: HTTP 401 Code: Bad user/password webarmi ?????????

    Don’t worry i changed the password shown above with my real one while checking.

    One more clue for u, my old password working fine now, i hope the prob is only with my new pass contains special characters

    [Note: I double checked this comment b4 posting 😉 ]

  7. Hi!

    Eager to test something new to boost my deli.cio.us account, but…

    When I try the tool I only get an error message “Failed to connect to Yahoo! Search” when I try to generate content XMl on step 2.

    Generating the annotation xml only generates an empty file…

    Perhaps I didn’t figure everything out;)

  8. Riyaz – Done. I wasn’t escaping the password before.

    IP – In the process of doing this fix I may have messed up some connection settings. If it’s still not working for you then shoot me an email (you can find my contact info in the bio).

    Really appreciate the feedback.

  9. Just updated the service so all post login requests encrypt the password parameter. The server rekeys every 30 minutes which should provide ample time for a user to generate his/her XML. If the login does not work, it most likely happened due to the key expiration, so just try re-logging in. If all else fails, just post the issue here or email me. Thanks.

  10. Thanks for the awesome product Zooie. I have one question. Everytime I add a new delicious link, do I need to go thru this process again of creating the annonate.xml and upload.
    If that is so, is there a way to simplify that ?

  11. My 3500+ account’s annotation file generated a “413- Your client issued a request that was too large.” error while loading it into coop. Can I break the file into separate bits and load them one by one.

    Tom

  12. I tried this, and it only seems to have moved across a few urls, as shown by this output on the sites page:

    http://www.longfocus.com/firefox/gmanager/* Firefox Extensions Google

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    http://www.awriterz.org/Fantasy/* Awriterz Fantasy

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    beautifulbeta.blogspot.com/2006/10/pullquotes-for-your-blog.html Article Blog Publishing

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    http://www.eusing.com/CDRipper/CDRipper.htm Computers Entertainment Music Software

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    wiki.rubyonrails.com/rails/pages/HowtoSetupApacheWithFastCGIAndRubyBindings Article Linux Ruby Research

    All of the tags seem to be imported, but not most of the actual bookmarks…

  13. Sanjay – For now yes. The delicious API does support an update (which will push only new links since the last call) so it’s definitely feasible. When time cycles free up I’ll add that in. Thanks for the feature suggestion.

    Tom – Yeah there’s a limit on the XML file size being pushed back through the browser. Two solutions: (1) I can save the file on the server (but I’m reluctant to use server storage at the moment) (2) My wizard allows the user to generate annotations per delicious tag. Try that – so produce annotation files for your favorite delicious tags and just upload each one sequentially in the CSE.

    Stephen – Did you check the Rank option? Or filter your bookmarks by a tag? The rank option most likely won’t do every bookmark due to the expensiveness of retrieving the Other counts (the delicious API really needs to expose these numbers in the posts/all call). If you didn’t do either, then shoot me an email (I have my contact info in my bio page).

  14. yours is the second blog (or actually third maybe) I have ever bookmarked (yea I don’t use aggregators)
    I already was like deleting my co-op account then I read through this post once and then TA-DA
    http://taxa.search.googlepages.com/home
    I even licensed it with same exact license as you had just to be sure.
    but I think I have comitted at least a dozen of copyright infringements as well

    I always get all giggly seeing the google labs logo but this just close to too exciting
    so yea thanks for pointing out how it’s done and I haven’t even started with implementing that facets x labels thing which sounds great (probably first I have to make a del.icio.us account)
    so yea this blog has been valuable content for me.

  15. Vik —

    Great tool. How would I take one of my subscriptions and turn it into a CSE.

    Let’s say that I’ve subscribed to the tag San Francisco, can I use your tool to take that subscribtion and generate web URLs that I can feed back into CSE?

  16. Hi Farhan – I would recommend looking into the OPML upload feature available in the Advanced tab of the CSE’s control panel. This will take OPML (and various RSS feed formats), extract its URL’s, and import them directly into the CSE. My tool currently just supports a user’s bookmarks available via del.icio.us’s API.

    The other option (in case the OPML feature does not work) would be to regex out the URL’s and pump them into a flat file (each link new-line separated), then paste the links in the sites box (Sites tab).

  17. Throw me an account, this looks awesome. Was thinking of purpose-building an app to do same, but running with Google is even better. Any chance of getting a copy of this to run on my own server/alter?

  18. Pingback: Sundaize Blog
  19. Hey Danny – Sorry for the delay. Haven’t had a chance to get to it. You might want to look at the Google Custom Search site. They have a new feature called ‘Linked CSE’ – I think this might do what you want.

  20. I can’t get your page to load! Dying to see what you’ve done here, but it keeps telling me the connection timed out. Any hints?

  21. Zooie, I am interested in helping you out – However, do you have any idea how much cpu usage and bandwidth does your app require?

  22. I stumbled across this page while searching for a way to search my Ma.gnolia bookmarks. I wasn’t able to find anything else, so I wrote my own little tool (http://nemti.awardspace.com/goo.gnolia/). It’s just a rather simple implementation of Google linked CSE. Yours sounds much more featureful – I hope you can find a host, and it would be great if you could add Ma.gnolia support.

  23. I tried this, but when I uploaded my XML annotations and skeleton, I got an “error parsing XML at line 3” message in both cases. Can you tell me what I did wrong? Thanks.

  24. Hi Kristen – Good chance Google changed their XML formats since I developed this tool. Could you send me your XML (the one which produces the bug)? vik.singh [at gmail]. Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s