April 2006 – Vik's Blog

[Update (4/26/2006): Motivation]

I've received several emails/comments (mostly from researchers and professors) regarding this post and realized some may have misconstrued my intentions for writing this article. I don't blame them. The title sounds pretty controversial and this post is quite long – compelling readers to skim and consequently miss some of the points I'm trying to make. As I mention in the second paragraph, 'I'm a huge fan of publications. The more the better.' All I'm trying to do here is make the case for why I think researchers should ALSO use more informal avenues of communication, such as blogs, for getting their ideas out for public review/commenting. These sources would not only increase reach/access, but serve as great supplementary material to their corresponding conference papers – NOT as replacements or alternatives to publications. I received a very insightful idea from Hari Balakrishnan – What if authors, in conjunction with their papers, release informal 5-6 page summaries of their projects/ideas for online public review/commenting? I would love to have resources like these available to me when dissecting the insight/knowledge encapsulated in formalized papers. I think simple ideas like these would significantly improve reachability/access of our research and even encourage more creativity/questioning.

[End Update]

On Digg:

http://digg.com/science/Are_professors_too_paper-happy_

I recently read through a professor's CV, which under the 'Publications' section stated "… published over 100 conference papers." I then proceeded to the next page of the resume to scan through some of his paper titles, only to see a brand new section titled 'References'. Wait, where'd his publications go? I went back to see if I missed a page. Nope. Wow … why wouldn't he include his citations or mention notable pieces of work? I mean, saying you have 100+ publications gives me no value unless you're going to list them. Granted, that's an amazing accomplishment – writing a paper that advances (cs) research is not only time consuming and hard work, but it requires a whole new level of creativity. Additionally, conferences are getting super competitive (unless he's publishing in these conferences) – I hear many cases of famous, tenured professors/researchers getting their papers rejected. Now multiplying this paper process by 100 represents a significant amount of research, so I'm willing to bet this guy is pretty bright. However, this one line publication description gives me the impression that this professor aims for quantity in order to amplify his prestige. I also get this feeling after publishing the 100th paper he reached one of those hard set goals that gives one the sense of 'Mission Accomplished'. I guess I'm different – I'm all about quality. Personally, I'd be content with 2 publications in my life if one of 'em got me the Turing Award and the other a date with Heidi Klum – but that's just me 🙂

Interestingly (or oddly), this CV publication description also got me thinking about some of the things I hate about (cs) papers. But first, let me make it very clear – I am a huge fan of conference publications. The more the better. The peer review process of research papers is absolutely critical to advancing science. In this post, however, I would like to make the case for why I think all researchers should ALSO use more informal avenues of communication (such as blogs, wikipedia, etc.) for publicly broadcasting their ideas and results of their latest research.

So let's start this off with a laundry list of gripes I have about 'most' papers (with possible alternatives/solutions mixed in):

PDF/PS/Sealed/doc/dvi/tex formats

That load time for the viewer is killer – probably deters many from even clicking the link
Certain viewers render diagams/text differently
The document looks formal, intimidating, elitist, old, plain, hardcore, not happy to see you
Also makes documents feel permanent, finalized, not amenable to changes
Provides no interface for commenting by readers – and in many cases I find reader critiques more interesting/useful than the actual paper
And why can't we see the feedback/comments from the conference?
Not pretty – It's a lot of material to read, so why not make the presentation look happier? I seriously think a nice stylesheet/web layout for paper content would significantly improve readability and portability

It's a lot of work

Not only does one need to come up with a fairly original idea, but research, discuss, and analyze its empirical/theoretical implications
Needs support/citations
Papers are quite formulaic – I can easily guess the page that has the nice/pretty graphs
This structure imposes pretty big research burdens on the authors
This is a GOOD thing for paper quality
But a terrible method for prototyping (similar to how UI dev's quickly prototype designs to cycle in user feedback)
Doesn't provide professors/researchers a forum to quickly get ideas out nor to filter in comments
Also prevents researchers from spewing their thoughts out since they wish to formalize everything in papers
Now there are other journals/magazines which are informal to let researchers publish their brainstorming/wacky idea articles
But there's still a time delay in getting those articles published
They still have writing standards and an approval process
And I don't read them (Where do I find good informal CS articles online? Is anyone linking to these? Who's authoring them?)
In blogs, authors can be comedic, speak freely ("he's full of it", "that technique is a load of crap" – opinions like these embedded in technical articles would making reading so much more enjoyable), and quickly get to the point without having to exhaust the boring implementation details

Access

Although papers normally include author emails, this means feedback is kept privately in the authors' inboxes, not viewable to the general public
Many papers/journals require subscriptions/fees

Prestige

The main issue here is professors/researchers want to publish in prestigious papers/journals
Rather than waste their time with things like weblogs, which are perceived to be inherently biased and non-factual, and where the audience may seem 'dumber'
It's in their best interest to focus on publications – get tenure, fame, approval from experts in the field

I want informality

But it sucks for us ('us' being the general audience who may wish to learn about these breakthroughs)
I love hearing professors ramble off their ideas freely
I want to see commentary from readers and myself ON these articles
And I want to see these ideas get posted and updated quickly
I want to see these experts explaining their ideas in simple terms (whenever possible, if it's too hardcore then it's probably not very good bloggable material) and describe the real world impacts/examples
But unfortunately NONE of my favorite professors/researchers publish their ideas on blogs

Popularity and Reach

There isn't much linkability happening for papers (besides from class/professor web sites)
You don't see conference papers getting linked on Slashdot/Digg
If millions of geeks don't even want to link/read this stuff, how and why would others? (refer below to the concluding section for why I think this is important)
Papers are formal, old-school, and designed to be read by expert audiences
They are written to impress, be exhaustive/detailed, when in reality most of us are lazy, want to read something entertaining, and get to the dang point
Wouldn't it be nice if professors/researchers expressed their ideas/research in a bloggish manner whenever possible?
At best a blog post would be a great supplemental/introductorial reading to the actual paper
Even the conference panel can check the blog comments to see if they missed anything
Some professors express their ideas informally with lecture notes and powerpoint presentations
But again, these formats don't let others annotate it with their comments
They are mostly pdf/ps/ppt/doc's (ah that load time)
And lecture notes usually exist for core concepts, not experimental/recent ideas

Wikipedia

Which brings me up another interesting idea
Where's THE place to learn about core, fundamental cs topics?
Or even slightly maturing cs algorithms/techniques?
I tend to follow this learning process:
- Check Wikipedia
- At best gives me a great introduction to a topic
- If i need more, I use what I learned from wikipedia/other sources to build queries for searching lecture notes/presentations
- Typically the interesting details are hidden in a set of lecture notes/papers/books
- And which of these sources should I use? – I have to read many versions coming from different professors to piece together a clear story
Wouldn't it be nice to have a one stop shop
Wikipedia!
What if Universities MOTIVATED professors/researchers to collaborate on wikipedia to publish in-depth articles regarding their areas of interest?
This would be huge
Wikipedia is incredibly popular/useful/successful, so let's use it as the place where professors around the world can unite their knowledge
Would be the best supplement (even replacement) for lecture notes
And for more experimental/controversial topics, researchers can use individual blogs

A slight digression and concluding remarks: The Future of Computer Science Research

Two things:

I strongly believe the future of (computer) science relies on making our stuff more accessible to others – requiring us to tap formal AND informal avenues of communication
We need to be more interdisciplinary!

Many people (especially prospective computer science students) ask me:

What's left to invent?

Most research to them (and me) sounds like 'optimizations' based work
Is there anything revolutionary left to do? Has it all been accomplished?
This is a hard question to answer convincingly, especially since if there was something revolutionary left to do, someone (hopefully me) would be on it
It's also difficult to forsee the impacts of ongoing research
Big ideas/inventions are evolutionary, piecing together decades of work.

They even say … "if only I could go back in time and invent the TV, PC, Internet, Web Search …"

As if back in the day we were living in the stone age and there was so many things left to do
There are even more things left to do today!

Our goal should not be to come up with an idea that surpasses the importance of say the invention of the PC

(I'm not even sure if this is possible, and at best is an ill comparison to make)
But rather to learn about the problems people face NOW and use our knowledge of science to solve them
Research should be entrepreneurial: Find a customer pain and provide a method to solve it
Not the other way around: Playing with super geeky stuff for fun and hoping later it might have applications (there are exceptions to this, since it encourages depth/exploration of a field which may lead to an accidental discovery of something huge, but I still think for the most part this is the wrong way to go about research)
Your audience is the most important element of research
And the beauty of our field – IT is a common theme in every research area!
We're the common denominator!
We can go about improving research in any subject

Now getting back to focus …

We consider the PC, Internet, etc. to be revolutionary because they are intuitive, fundamental platforms
Current research adds many layers of complexity to these ideas in order to solve more difficult problems
Making things more and more complex
What we need to be doing better is making our complex systems easier to use
We need better integration
And expand our research applications into other fields, rather than just reinforcing them back into computer science areas
We need to be interdisciplinary!
We have yet to even touch the surface when it comes to exploring the overlaps of fields

Some of the things we've been brewing since the 1970's could seriously REVOLUTIONIZE other industries

Imagine machine learning a database filled with all past medical histories/records/diagnostic reports
- So when presented with symptoms/readings of a new patient, the system will tell you what that patient is suffering from with high probability based off analyzing millions of records
- This would dramatically decrease the number of cases of death due to bad diagnosis and lower medical/insurance costs since it wouldn't be necessary to run a bunch of expensive scans/tests and surgeries to figure out what's wrong with a patient
A similar system could do the same thing over economics, disease, environmental, hazards datasets so that scientists and policy makers can ask questions like 'In the next five years what region of the world has the highest probability of a natural disaster … or a disease epidemic?, etc.'
- This would be huge in shaping policy priorities, saving human lives, and preparing/preventing disasters like Katrina from ever escalating
Or what about mobile ad hoc networks to let villages in the poorest regions of the world wirelessly share an internet connection
Or sensor networks to do environmental sensing, health monitoring in hospitals, traffic control, pinpointing the location of a crime, etc.
And plenty, plenty more

The attractive elements of computer science research is not necessarily the algorithms, nor the programming implementation

But rather the intelligence/knowledge it can provide to humans after partying on data
Information is what makes a killer application!
And every field has tons of interesting information
We need to INTERACT and learn more about the needs in other fields and shape our research accordingly
(Unfortunately, computer scientists have a somewhat bad reputation when it comes to any form of 'interaction')

So after hearing all this, the prospective science students complain:

"Now we have to learn about multiple fields – our research burden is so much bigger than the scientists' in the past"
This is a very silly argument for defending one's laziness to pursue (computer) science
I mean I much rather hear one of them 'I don't want to get outsourced to India' excuses 🙂
Here's why:
- We got it GOOD
- I mean, geez, we got Google!
- I can't even begin to imagine what Einstein, Gauss, Turing, Newton, etc. could have accomplished if they had the information research tools we got
- We could have flying cars, cleaner environment, robots doing our homework
- Heck, they could have probably of found a way to achieve world peace
- We take what we have for granted
Back in the day, where there was no email, no search, no unified, indexed repository of publications
Scientists had to do amazing amounts of research just to find the right sources to learn from
Travel around the world to meet with leading researchers
Read entire books to find relevant information
You see, scientists back then had no choice but to be one-dimensional, and those who went deep into different fields either had no lives or were incredibly brilliant (most likely both)
But we can do all this work they had to do just to prepare for learning in a matter of seconds, from anywhere!
It's utter crap to think we can't learn about multiple fields of interest, in-depth
And our research tools will only get better and better exponentially faster
But I don't blame these students for their complaint
We're still trained to be one-dimensional
Universities have specific majors and graduate programs (although some of these programs are SLOWLY mixing together)
Thereby requiring many high school students to select a single major on the admissions application
Even though their high school (which focuses on breadth) probably did a terrible job introducing them to liberal arts and science
[Side Note: Actually, this is one of things I never quite understood. How do students choose a major like EE/NuclearEng/MechEng on the admissions application, when their high school most likely doesn't offer any of those courses? Sounds like their decision came down to three factors (1) The student is a geek (2) Parent's influence (3) Money. 2/3 of these are terrible reasons to enter these fields. Even worse, many universities separate engineering from liberal arts, making it almost impossible for a student who originally declared 'Economics' to switch to BioE after taking/liking a biotech class – despite the fact that their motivation to enter the major is so much better. You should enter a field you love, and making the decision on which students can enter engineering majors based off high school, which gives you almost zero exposure to these fields, makes no sense. And we wonder why we have a shortage of good engineers … well one reason is probably because we don't let them into the university programs – 'them' being the ones who actually realized they liked this stuff when they got to college.]
Many job descriptions want students/applicants with skills pertaining to one area

But unfortunately the problems we now face require us to understand how things mix together

This is why it's very important we increase the levels communications between fields to promote interdisciplinary research
Like a simple example: Databases and IR
- Both have the same goal: To answer queries accurately regarding a set of data
- Despite this, they are seen as two traditionally separate areas
- Database/IR people normally work on different floors/buildings at companies/universities
- One system is based traditionally more on logic while the other is more statistical/probablistic
- But exploiting their overlap could greatly improve information access (a la SQL Server 2005)
- Notice this example refers to two subtopics under just computer science, not even two mainstream fields!
Just imagine the impacts of collaborating more with other fields of study

This is why I feel blogging, wikipedia, and informal avenues of communicating our thoughts/research/ideas is very important

So that people in other fields can read them
And for the laundry list of reasons above
Even if it's just a proof of concept technique with no empirical studies
An interested geek reader who came across the blog post might just end up coding it up for you
Could even inspire start-up companies

So the POINT: Do whatever you can to voice your ideas/research to the largest possible audience in the hopes of cross fertilizing research in many fields.

* So what's up with the parentheses around 'cs' and 'computer' in this post? Well, many of these points generalize to all (science) research publications 🙂

Vik's Blog

Posts that pay homage to Jim Gray's "Let's party on the data" line.

Month: April 2006

Are professors too paper-happy?