Category Archives: Non-Technical-Read

TweetNews (Real-Time Search) Is Back

Update: Twitter’s Search API seems to timeout quite a bit so many search results don’t get any tweets linked. Try again later or refer to the screenshots below. Also, delicious.com is now testing an early version of this model for its homepage ranking.

Here it is  tweetnews.appspot.com

And an example query  yahoo

About six months ago I released a simple 100 line search application called TweetNews, which basically links tweets to the freshest Yahoo! News articles. The more related tweets an article has, the higher its rank. The tweet count and messages are presented underneath each result so that a user can read the social commentary inline with the article listing. It was developed more to demonstrate the openness and power of Yahoo! BOSS (you can read more about it in my previous posts here and here). Remarkably, many users found the service useful despite its slow performance, barebones UI, lack of homepage, domain, (you name it), etc.

Interestingly, the TweetNews concept has been popping up in my recent discussions around real-time search, so I felt it was about time to polish up TweetNews to serve as a better proof of concept.

Here are some of the new features:

  • Sweet UI (kudos to Kara McCain & Aaron Wheeler for the awesome design and template)
  • Continually Updated, Fresh Homepage (aggregates & ranks feeds like Techmeme, Delicious, Digg)
  • Faster Performance
  • Improved Algorithm
  • Local Views (re-rank & link tweets from a select region)

.

Here’s a screenshot of the homepage:

TweetNews Homepage

.

And here’s an example of Local Views:

London’s View of ‘iphone’

TweetNews IPhone (London Ranking)

Los Angeles’ View of ‘iphone’

TweetNews IPhone (Los Angeles Ranking)

Striking difference between Americans (actually just SoCal) and the British right there 🙂

I think the Local Views concept is pretty promising, although there’s plenty of room for improvement (use BOSS region filters, access Twitter’s Firehose Feed for more granularity, etc.).

Which is why, like I did with the last version, plan to open source all the code powering this application (just need a little more time to get it reviewed).

Interestingly, the homepage system in this package is very general. Just pass it any list of RSS feeds and it’ll do the clustering, tweet linking, ranking, and page generation automatically every X minutes for you. Anyone want a fresh, personalized Techmeme? Let me know if that sounds interesting.

Please keep in mind that this is still a simple, early prototype to show how one can use BOSS to experiment with very interesting data sources like Twitter to tackle big problems like real-time search.

6 Comments

Filed under Blog Stuff, Boss, Code, Information Retrieval, Non-Technical-Read, Open, Research, Search, Social, Techmeme, Twitter, UI, Yahoo

Twitter + BOSS = Real Time Search

Try ityahoo

Update: (6/25) This application has been updated. Go here to learn more. The description below though still applies.

Update: (6/11) In case you’re bored, here’s a discussion we had with Google and Twitter about Open & Real-time Search.

Update: (1/19) If you have issues try again in 5-10 minutes. You can also check out the screenshots below. (1/15) App Engine limits were reached (and fast). Appreciate the love and my apologies for not fully anticipating that. Google was nice enough though to temporarily raise the quota for this application. Anyways, this was more to show a cool BOSS developer example using code libraries I released earlier, but there might be more here. Stay tuned.

Here’s a screenshot as well (which should hopefully be stale by the time you read this).

Basically this service boosts Yahoo’s freshest news search results (which typically don’t have much relevance since they are ordered by timestamp and that’s it) based on how similar they are to the emerging topics found on Twitter for the same query (hence using Twitter to determine authority for content that don’t yet have links because they are so fresh). It also overlays related tweets via an AJAX expando button (big thanks to Greg Walloch at Yahoo! for the design) under results if they exist. A nice added feature to the overlay functionality is near-duplicate removal to ensure message threads on any given result provide as much comment diversity as possible.

Freshness (especially in the context of search) is a challenging problem. Traditional PageRank style algorithms don’t really work here as it takes time for a fresh URL to garner enough links to beat an older high ranking URL. One approach is to use cluster sizes as a feature for measuring the popularity of a story (i.e. Google News). Although quite effective IMO this may not be fast enough all the time. For the cluster size to grow requires other sources to write about the same story. Traditional media can be slow however, especially on local topics. I remember when I saw breaking Twitter messages describing the California Wildfires. When I searched Google/Yahoo/Microsoft right at that moment I barely got anything (< 5 results spanning 3 search results pages). I had a similar episode when I searched on the Mumbai attacks. Specifically, the Twitter messages were providing incredible focus on the important subtopics that had yet to become popular in the traditional media and news search worlds. What I found most interesting in both of these cases was that news articles did exist on these topics, but just weren’t valued highly enough yet or not focusing on the right stories (as the majority of tweets were). So why not just do that? Order these fresh news articles (which mostly provide authority and in-depth coverage) based on the number of related fresh tweets as well as show the tweets under each. That’s this service.

To illustrate the need, here’s a quick before and after shot. I searched for ‘nba’ using Yahoo’s news search ordered by latest results (first image). Very fresh (within a minute) but subpar quality. The first result talks about teams that are in a different league of basketball than the NBA. However, search for ‘nba’ on TweetNews (second image) and you get the Kings/Warriors triple OT game highlight which was buzzing more in Twitter at that minute.

'NBA' on Y! News latest

'NBA' on Y! News latest

'NBA' on Y! News latest enhanced by Twitter

'NBA' on TweetNews

There’s something very interesting here … Twitter as a ranking signal for search freshness may prove to be very useful if constructed properly. Definitely deserves more exploration – hence this service, which took < 100 lines of code to represent all the search logic thanks to Yahoo! BOSS, Twitter’s API, and the BOSS Mashup Framework.

To sum up, the contributions of this service are: (1) Real-time search + freshness (2) Stitching social commentary to authoritative sources of information (3) Another (hopefully cool) BOSS example.

The code is packaged for general open consumption and has been ported to run on App Engine (which powers this service actually). You can download all the source here.

99 Comments

Filed under Blog Stuff, Boss, Code, CS, Data Mining, Google, Information Retrieval, Non-Technical-Read, Open, Research, Search, Social, Twitter, Yahoo

Yahoo! Boss – An Insider View

Disclaimer: This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Boss stands for Build your Own Search Service. The goal of Boss is to open up search to enable third parties to build incredibly useful and powerful search-based applications. Several months ago I pitched this idea to the executives on how Yahoo! can specifically open up its search assets to fragment the market. It’s remarkable to finally see some of the vision (with the help of many talented people) reach the public today.

Web search is a tough business to get into. $300+ Million capex, amazing talent, infrastructure, a prayer, etc. just to get close to basic parity. Only 3 companies have really pulled it off. However, I strongly believe we need to find innovative, incremental ways to spread the search love in order to encourage fragmentation and help promising companies get to basic parity instantly so that they can leverage their unique assets (new algorithm, user data, talent) to push their search solution beyond the current baseline.

Search is all about understanding the user’s intent. If we can nail the intent, then search is pretty much a solved problem. However, the current model of a single search box for everything loses an intent focus as it aims to cater to all people and queries. Albeit, a single search box definitely makes our lives easier, but I have a hard time believing this is the *right* approach.

In my online experience, I typically visit a variety of sites: Techmeme, Digg, Techcrunch, eBay, Amazon, del.icio.us, etc. While on these pages, something almost always catches my eye, and so I proceed to the search box in my browser to find out more on the web. Why do we have this disconnected experience? I think it’s because these sites do not provide web-level comprehensiveness. It’s unfortunate, because the page that I’m on may have additional information about my intent (maybe I’m logged in so it has my user info, or it’s a techy shopping site).

The biggest goal of Boss is to help bootstrap sites like these to get comprehensiveness and basic ranking for free, as well as offer tools to re-rank, blend, and overlay the results in a way that revolutionizes the search experience.

When I’m on del.icio.us, why can’t I search in their box, get relevant del.icio.us results at the top, and also have web results backfill below? I think users should be confident that if they searched in a search box on any page in the whole wide web that they’ll get results that are just as good as Yahoo/Google and only better.

The first milestone of Boss is a simple one: Make available a clean search API that turns off the traditional restrictions so that developers can totally control presentation, re-rank results, run an unlimited number of queries, and blend in external content all without having to include any Yahoo! attribution in the resulting product(s). Want to build the example above or put news search results on a map – go for it!

Here’s a link to the API:

http://developer.yahoo.com/search/boss/

Also, check out the Boss Mashup Framework:

http://developer.yahoo.com/search/boss/mashup.html

The Boss Mashup Framework in my opinion makes the Boss Search API really useful. It lets developers use SQL like syntax for operating on heterogeneous web data sources. The idea came up as I was working on examples to showcase Boss, and realized the operations I was developing imperatively followed closely to declarative SQL like constructs. Since it’s a recent idea and implementation, there may be some bugs or weird designs lurking in there, but I strongly recommend playing around with it and viewing the examples included in the package. I’m biased of course but do think it’s a fun framework for remixing online data. One can rank web results by digg and youtube favorite counts, remove duplicates, and publish the results using a provided search results page template in less than 30 lines of code and without having to specify any parsing logic of the data sources/API’s as the framework can infer the structure and unify the data formats automatically in most cases.

The next couple of milestones for Boss I think are even more interesting and disruptive – server side services, monetization, blending ranking models, more features exposure, query classifiers, open source … so stay tuned.

46 Comments

Filed under Blog Stuff, Data Mining, Information Retrieval, Non-Technical-Read, Open, Search, Techmeme

Is the Facebook Application Platform Fair?

Take a look at this stats deck from O’Reilly’s Graphing Social Patterns conference:

http://en.oreilly.com/gspeast2008/public/asset/attachment/2950

Fairly in-depth and recent [6/01/2008] analysis of the application usage in Facebook and MySpace.

As expected, lots of power law behavior.

I found the slides describing churn to be pretty interesting. Since October 2007, nine of the top fifteen most popular applications are new. However, only three of those new applications debuted after March 2008. I expect the amount of churn in the top spots to continue to drop based on the recent declining active usage trends and Facebook’s efforts to curb application spam (new UI that puts applications in a separate profile tab, app module minimizing, viral friend messaging limits, security compliance, etc).

What I would find even more interesting is a study of the number of applications users install, and how those moving averages have changed over time. Like say the number of applications a typical user installs is 4. Once the user reaches that threshold, what’s the churn like then? Specifically, what are the chances that a user will add a new app? Or maybe an even better metric: how long does it take, and how does this length of time compare to when the user had 1 app and increased to 2, or 2 apps and increased to 3, etc.? Basically, what’s the adoption rate/times based on current application counts?

I believe it becomes harder to influence a user to add or replace for a new app if the number of current apps the user has is high. I think most users, without even knowing it, have a threshold of how many total apps they are willing display on their profile – and that this threshold is based on an ongoing evaluation of the utility and efficiency of the page. Each app takes up real estate on the profile page, and a “rational” user will only show so many until page load times degrade and/or core modules (wall, general information, albums, networks) get drowned in clutter and thus become difficult for users to locate. Of course, social networks like MySpace which have very minimal profile page design constraints prove that most users are irrational 😉 – but it’s this design control that greatly helped Facebook dominate the market IMO.

If this is true, then it means that first movers really, really win in the Facebook apps world. Companies like Slide and Rockyou manage many of the top applications, and given the power law market share phenomena, they control a majority stake of application usage and installs. Many of these companies had the early bird advantage, and once winners, always winners – acquisitions of emerging applications, leveraging branding and existing audiences (a.k.a monopoly) to cross promote potentially copy-cat applications faster and wider than the competition, etc. Monopolies inside Facebook have unsettling ramifications, as they block newcomers from capturing profile space. If they fail to innovate (as most monopolies) then next-gen application development may never get through.

Now, if users do have an application count threshold, and it becomes successively more difficult to replace/add a new app as this count increases, then any apps developed now have a substantially rarer chance of gaining market share. If winner’s win, first movers reap, and churn becomes improbable over time, then the early top apps have already most likely filled up users’ allocated app slots.

I find thinking of the profile page as a resource allocation problem rather fascinating. Essentially, there are finite resources on a page and we expect rational users to perform some optimization to allocate resources to maximize utility for themselves and for others (potential game theory link). Once users fill up these resources, human laziness kicks in. Another warrant for improbable churn is that users who want to add new applications after filling up their resource limit will need to remove an existing app to make space. The standards for change are higher now, as the user must compare the new app to an existing preferred app (which probably is a popular early-bird app that friends use), and so the decision will incur a trade-off.

One could also argue that with more apps available now (second slide shows that despite sluggish usage the # of app’s being developed is still growing insanely) users are burdened with more choices. Or, one could argue because most users have reached their app limit, and thus, churn has become improbable, the discoverability of new apps among friends (a critical channel for adoption) also becomes improbable.

Under this theory, especially in context of Facebook’s current efforts and app stats, the growth of new app adoption in social networks will continue to slow down.

So what can be done here?

The platform needs to encourage more churn by building a fairer market that matches users to high quality apps that satisfy their expressed intents. At the end of the day, these applications are really just web pages, but unlike the web, they do not leverage important primitives like linking and meta tags. Search engines like Google and Yahoo use these features extensively to calculate authority and relevance. In the long run, as the number of sources increases, advanced ranking algorithms and marketplaces are necessary to scale and ensure fairness to worthy tail publishers. Maybe social networks should inherit these system properties to bolster their tail applications.

Also, Facebook needs to encourage users to variate or add more applications to their profile page. Facebook’s move to put applications in its own profile tab may very well achieve this goal, but at a consequence of lowering their visibility.

Anyways, just some random thoughts about the current state of Facebook apps. It’ll be very interesting to see how their platform progresses and how it will be perceived by end users and developers in the future.

3 Comments

Filed under Economics, Facebook, Non-Technical-Read, Social, Statistics, Trends

Surviving a Lunch Interview

I always found lunch interviews to be the most frustrating experiences ever. There you are, given an opportunity to pig out in a grand cafeteria on corporate expense – so naturally, you stock up the tray to get your chow down. You sit down at the table across from your interviewer, and right as you’re about to take that first scrumptious bite, your interviewer asks you a question. You of course answer it completely, but before returning to the meal you’re asked a follow-up question, and then another one, and before you know it rapid fire Q/A begins. You do your best to answer each one … as your food gets cold … as your stomach growls … and as you watch the interviewer nodding to your comments with his/her mouth filled with that savory steak and potatoes you’re dying to devour. Why can’t the interviewer just go to the bathroom or receive a cell call already?!

This isn’t the interviewer’s fault by any means. After all, it is an interview, and their role compels constant question asking (silence is awkward). Additionally, this whole food tease leading to short-term starvation isn’t the worst consequence. You can get food stuck in your teeth, pass gas, get bad breath, spill your food and drink all over your interviewer, etc. It’s probably the most dangerous, error-prone part of the interview process (actually probably not … since you typically don’t get asked technically involved questions over food).

So here’s some advice to those who find themselves in similar situations. Sadly, it took me nearly three years of lunch interviews to discover these pointers:

  1. Eat a big breakfast. Lunch should be a snack.
  2. When you do eat lunch, order the soup with bread. Warms your body and soothes your throat. Simple to eat. Nothing gets stuck to your teeth. No need to wash the hands, so hands don’t get dirty for that final handshake. It’s not greasy (like pizza) so doesn’t reflect bad diet habits to your interviewer. Also, the bread soaks in the soup to make the meal filling plus give you additional energy for the rest of the day.
  3. Eat slowly, since your interviewer probably got more food than you. You don’t want to finish earlier than him/her. It tends to rush the other person. Your goal is to make the lunch round long and fun. Keep the conversation going but don’t over do it to the point where the interviewer starts to daze off. Ask questions when the interviewer runs out of questions (also gives you more time to eat!). Make the most of lunch to learn as much as you can about the group. Their insight will be super useful in the upcoming rounds. Just think of lunch as a break before the more technical rounds.
  4. Drink water. It really is the best drink ever. No chance of an upset stomach during or after the round. If you’re starving and know the soup + bread won’t fill you up (eating slowly helps fill you up though), get an Odwalla. It’s seriously a second meal.
  5. Don’t take notes. That’s too much IMHO. Keep it informal, unless the interviewer specifies otherwise.

That’s all I got. Nothing crazy.

Anyways, hope these pointers come in handy.

3 Comments

Filed under Job Stuff, Non-Technical-Read

Are professors too paper-happy?

[Update (4/26/2006): Motivation]

I've received several emails/comments (mostly from researchers and professors) regarding this post and realized some may have misconstrued my intentions for writing this article. I don't blame them. The title sounds pretty controversial and this post is quite long – compelling readers to skim and consequently miss some of the points I'm trying to make. As I mention in the second paragraph, 'I'm a huge fan of publications. The more the better.' All I'm trying to do here is make the case for why I think researchers should ALSO use more informal avenues of communication, such as blogs, for getting their ideas out for public review/commenting. These sources would not only increase reach/access, but serve as great supplementary material to their corresponding conference papers – NOT as replacements or alternatives to publications. I received a very insightful idea from Hari Balakrishnan – What if authors, in conjunction with their papers, release informal 5-6 page summaries of their projects/ideas for online public review/commenting? I would love to have resources like these available to me when dissecting the insight/knowledge encapsulated in formalized papers. I think simple ideas like these would significantly improve reachability/access of our research and even encourage more creativity/questioning.

[End Update]

On Digg:

http://digg.com/science/Are_professors_too_paper-happy_

I recently read through a professor's CV, which under the 'Publications' section stated "… published over 100 conference papers." I then proceeded to the next page of the resume to scan through some of his paper titles, only to see a brand new section titled 'References'. Wait, where'd his publications go? I went back to see if I missed a page. Nope. Wow … why wouldn't he include his citations or mention notable pieces of work? I mean, saying you have 100+ publications gives me no value unless you're going to list them. Granted, that's an amazing accomplishment – writing a paper that advances (cs) research is not only time consuming and hard work, but it requires a whole new level of creativity. Additionally, conferences are getting super competitive (unless he's publishing in these conferences) – I hear many cases of famous, tenured professors/researchers getting their papers rejected. Now multiplying this paper process by 100 represents a significant amount of research, so I'm willing to bet this guy is pretty bright. However, this one line publication description gives me the impression that this professor aims for quantity in order to amplify his prestige. I also get this feeling after publishing the 100th paper he reached one of those hard set goals that gives one the sense of 'Mission Accomplished'. I guess I'm different – I'm all about quality. Personally, I'd be content with 2 publications in my life if one of 'em got me the Turing Award and the other a date with Heidi Klum – but that's just me 🙂

Interestingly (or oddly), this CV publication description also got me thinking about some of the things I hate about (cs) papers. But first, let me make it very clear – I am a huge fan of conference publications. The more the better. The peer review process of research papers is absolutely critical to advancing science. In this post, however, I would like to make the case for why I think all researchers should ALSO use more informal avenues of communication (such as blogs, wikipedia, etc.) for publicly broadcasting their ideas and results of their latest research.

So let's start this off with a laundry list of gripes I have about 'most' papers (with possible alternatives/solutions mixed in):

PDF/PS/Sealed/doc/dvi/tex formats

  • That load time for the viewer is killer – probably deters many from even clicking the link
  • Certain viewers render diagams/text differently
  • The document looks formal, intimidating, elitist, old, plain, hardcore, not happy to see you
  • Also makes documents feel permanent, finalized, not amenable to changes
  • Provides no interface for commenting by readers – and in many cases I find reader critiques more interesting/useful than the actual paper
  • And why can't we see the feedback/comments from the conference?
  • Not pretty – It's a lot of material to read, so why not make the presentation look happier? I seriously think a nice stylesheet/web layout for paper content would significantly improve readability and portability

It's a lot of work

  • Not only does one need to come up with a fairly original idea, but research, discuss, and analyze its empirical/theoretical implications
  • Needs support/citations
  • Papers are quite formulaic – I can easily guess the page that has the nice/pretty graphs
  • This structure imposes pretty big research burdens on the authors
  • This is a GOOD thing for paper quality
  • But a terrible method for prototyping (similar to how UI dev's quickly prototype designs to cycle in user feedback)
  • Doesn't provide professors/researchers a forum to quickly get ideas out nor to filter in comments
  • Also prevents researchers from spewing their thoughts out since they wish to formalize everything in papers
  • Now there are other journals/magazines which are informal to let researchers publish their brainstorming/wacky idea articles
  • But there's still a time delay in getting those articles published
  • They still have writing standards and an approval process
  • And I don't read them (Where do I find good informal CS articles online? Is anyone linking to these? Who's authoring them?)
  • In blogs, authors can be comedic, speak freely ("he's full of it", "that technique is a load of crap" – opinions like these embedded in technical articles would making reading so much more enjoyable), and quickly get to the point without having to exhaust the boring implementation details

Access

  • Although papers normally include author emails, this means feedback is kept privately in the authors' inboxes, not viewable to the general public
  • Many papers/journals require subscriptions/fees

Prestige

  • The main issue here is professors/researchers want to publish in prestigious papers/journals
  • Rather than waste their time with things like weblogs, which are perceived to be inherently biased and non-factual, and where the audience may seem 'dumber'
  • It's in their best interest to focus on publications – get tenure, fame, approval from experts in the field

I want informality

  • But it sucks for us ('us' being the general audience who may wish to learn about these breakthroughs)
  • I love hearing professors ramble off their ideas freely
  • I want to see commentary from readers and myself ON these articles
  • And I want to see these ideas get posted and updated quickly
  • I want to see these experts explaining their ideas in simple terms (whenever possible, if it's too hardcore then it's probably not very good bloggable material) and describe the real world impacts/examples
  • But unfortunately NONE of my favorite professors/researchers publish their ideas on blogs

Popularity and Reach

  • There isn't much linkability happening for papers (besides from class/professor web sites)
  • You don't see conference papers getting linked on Slashdot/Digg
  • If millions of geeks don't even want to link/read this stuff, how and why would others? (refer below to the concluding section for why I think this is important)
  • Papers are formal, old-school, and designed to be read by expert audiences
  • They are written to impress, be exhaustive/detailed, when in reality most of us are lazy, want to read something entertaining, and get to the dang point
  • Wouldn't it be nice if professors/researchers expressed their ideas/research in a bloggish manner whenever possible?
  • At best a blog post would be a great supplemental/introductorial reading to the actual paper
  • Even the conference panel can check the blog comments to see if they missed anything
  • Some professors express their ideas informally with lecture notes and powerpoint presentations
  • But again, these formats don't let others annotate it with their comments
  • They are mostly pdf/ps/ppt/doc's (ah that load time)
  • And lecture notes usually exist for core concepts, not experimental/recent ideas

Wikipedia

  • Which brings me up another interesting idea
  • Where's THE place to learn about core, fundamental cs topics?
  • Or even slightly maturing cs algorithms/techniques?
  • I tend to follow this learning process:
    • Check Wikipedia
    • At best gives me a great introduction to a topic
    • If i need more, I use what I learned from wikipedia/other sources to build queries for searching lecture notes/presentations
    • Typically the interesting details are hidden in a set of lecture notes/papers/books
    • And which of these sources should I use? – I have to read many versions coming from different professors to piece together a clear story
  • Wouldn't it be nice to have a one stop shop
  • Wikipedia!
  • What if Universities MOTIVATED professors/researchers to collaborate on wikipedia to publish in-depth articles regarding their areas of interest?
  • This would be huge
  • Wikipedia is incredibly popular/useful/successful, so let's use it as the place where professors around the world can unite their knowledge
  • Would be the best supplement (even replacement) for lecture notes
  • And for more experimental/controversial topics, researchers can use individual blogs

A slight digression and concluding remarks: The Future of Computer Science Research

Two things:

  1. I strongly believe the future of (computer) science relies on making our stuff more accessible to others – requiring us to tap formal AND informal avenues of communication
  2. We need to be more interdisciplinary!

Many people (especially prospective computer science students) ask me:

What's left to invent?

  • Most research to them (and me) sounds like 'optimizations' based work
  • Is there anything revolutionary left to do? Has it all been accomplished?
  • This is a hard question to answer convincingly, especially since if there was something revolutionary left to do, someone (hopefully me) would be on it
  • It's also difficult to forsee the impacts of ongoing research
  • Big ideas/inventions are evolutionary, piecing together decades of work.

They even say … "if only I could go back in time and invent the TV, PC, Internet, Web Search …"

  • As if back in the day we were living in the stone age and there was so many things left to do
  • There are even more things left to do today!

Our goal should not be to come up with an idea that surpasses the importance of say the invention of the PC

  • (I'm not even sure if this is possible, and at best is an ill comparison to make)
  • But rather to learn about the problems people face NOW and use our knowledge of science to solve them
  • Research should be entrepreneurial: Find a customer pain and provide a method to solve it
  • Not the other way around: Playing with super geeky stuff for fun and hoping later it might have applications (there are exceptions to this, since it encourages depth/exploration of a field which may lead to an accidental discovery of something huge, but I still think for the most part this is the wrong way to go about research)
  • Your audience is the most important element of research
  • And the beauty of our field – IT is a common theme in every research area!
  • We're the common denominator!
  • We can go about improving research in any subject

Now getting back to focus …

  • We consider the PC, Internet, etc. to be revolutionary because they are intuitive, fundamental platforms
  • Current research adds many layers of complexity to these ideas in order to solve more difficult problems
  • Making things more and more complex
  • What we need to be doing better is making our complex systems easier to use
  • We need better integration
  • And expand our research applications into other fields, rather than just reinforcing them back into computer science areas
  • We need to be interdisciplinary!
  • We have yet to even touch the surface when it comes to exploring the overlaps of fields

Some of the things we've been brewing since the 1970's could seriously REVOLUTIONIZE other industries

  • Imagine machine learning a database filled with all past medical histories/records/diagnostic reports
    • So when presented with symptoms/readings of a new patient, the system will tell you what that patient is suffering from with high probability based off analyzing millions of records
    • This would dramatically decrease the number of cases of death due to bad diagnosis and lower medical/insurance costs since it wouldn't be necessary to run a bunch of expensive scans/tests and surgeries to figure out what's wrong with a patient
  • A similar system could do the same thing over economics, disease, environmental, hazards datasets so that scientists and policy makers can ask questions like 'In the next five years what region of the world has the highest probability of a natural disaster … or a disease epidemic?, etc.'
    • This would be huge in shaping policy priorities, saving human lives, and preparing/preventing disasters like Katrina from ever escalating
  • Or what about mobile ad hoc networks to let villages in the poorest regions of the world wirelessly share an internet connection
  • Or sensor networks to do environmental sensing, health monitoring in hospitals, traffic control, pinpointing the location of a crime, etc.
  • And plenty, plenty more

The attractive elements of computer science research is not necessarily the algorithms, nor the programming implementation

  • But rather the intelligence/knowledge it can provide to humans after partying on data
  • Information is what makes a killer application!
  • And every field has tons of interesting information
  • We need to INTERACT and learn more about the needs in other fields and shape our research accordingly
  • (Unfortunately, computer scientists have a somewhat bad reputation when it comes to any form of 'interaction')

So after hearing all this, the prospective science students complain:

  • "Now we have to learn about multiple fields – our research burden is so much bigger than the scientists' in the past"
  • This is a very silly argument for defending one's laziness to pursue (computer) science
  • I mean I much rather hear one of them 'I don't want to get outsourced to India' excuses 🙂
  • Here's why:
    • We got it GOOD
    • I mean, geez, we got Google!
    • I can't even begin to imagine what Einstein, Gauss, Turing, Newton, etc. could have accomplished if they had the information research tools we got
    • We could have flying cars, cleaner environment, robots doing our homework
    • Heck, they could have probably of found a way to achieve world peace
    • We take what we have for granted
  • Back in the day, where there was no email, no search, no unified, indexed repository of publications
  • Scientists had to do amazing amounts of research just to find the right sources to learn from
  • Travel around the world to meet with leading researchers
  • Read entire books to find relevant information
  • You see, scientists back then had no choice but to be one-dimensional, and those who went deep into different fields either had no lives or were incredibly brilliant (most likely both)
  • But we can do all this work they had to do just to prepare for learning in a matter of seconds, from anywhere!
  • It's utter crap to think we can't learn about multiple fields of interest, in-depth
  • And our research tools will only get better and better exponentially faster
  • But I don't blame these students for their complaint
  • We're still trained to be one-dimensional
  • Universities have specific majors and graduate programs (although some of these programs are SLOWLY mixing together)
  • Thereby requiring many high school students to select a single major on the admissions application
  • Even though their high school (which focuses on breadth) probably did a terrible job introducing them to liberal arts and science
  • [Side Note: Actually, this is one of things I never quite understood. How do students choose a major like EE/NuclearEng/MechEng on the admissions application, when their high school most likely doesn't offer any of those courses? Sounds like their decision came down to three factors (1) The student is a geek (2) Parent's influence (3) Money. 2/3 of these are terrible reasons to enter these fields. Even worse, many universities separate engineering from liberal arts, making it almost impossible for a student who originally declared 'Economics' to switch to BioE after taking/liking a biotech class – despite the fact that their motivation to enter the major is so much better. You should enter a field you love, and making the decision on which students can enter engineering majors based off high school, which gives you almost zero exposure to these fields, makes no sense. And we wonder why we have a shortage of good engineers … well one reason is probably because we don't let them into the university programs – 'them' being the ones who actually realized they liked this stuff when they got to college.]
  • Many job descriptions want students/applicants with skills pertaining to one area

But unfortunately the problems we now face require us to understand how things mix together

  • This is why it's very important we increase the levels communications between fields to promote interdisciplinary research
  • Like a simple example: Databases and IR
    • Both have the same goal: To answer queries accurately regarding a set of data
    • Despite this, they are seen as two traditionally separate areas
    • Database/IR people normally work on different floors/buildings at companies/universities
    • One system is based traditionally more on logic while the other is more statistical/probablistic
    • But exploiting their overlap could greatly improve information access (a la SQL Server 2005)
    • Notice this example refers to two subtopics under just computer science, not even two mainstream fields!
  • Just imagine the impacts of collaborating more with other fields of study

This is why I feel blogging, wikipedia, and informal avenues of communicating our thoughts/research/ideas is very important

  • So that people in other fields can read them
  • And for the laundry list of reasons above
  • Even if it's just a proof of concept technique with no empirical studies
  • An interested geek reader who came across the blog post might just end up coding it up for you
  • Could even inspire start-up companies

So the POINT: Do whatever you can to voice your ideas/research to the largest possible audience in the hopes of cross fertilizing research in many fields.

* So what's up with the parentheses around 'cs' and 'computer' in this post? Well, many of these points generalize to all (science) research publications 🙂

This work is licensed under a Creative Commons License

 

4 Comments

Filed under Blogging, CS, Education, Non-Technical-Read, Publications, Research, Science, Wikipedia