COVID-19 data – US deaths for each day we don’t lockdown

tl;dr (3/24/2020):

(1) Total deaths in one month in the US will be 416K at the current rate.

(2) If US locks down like Italy, total deaths in one month drops to 10K – and 6K if our efforts are halfway between Italy’s and China’s (and 3K if like China).

(3) Not fully locking down will result in 1K – 15K additional deaths per day (grows each day) we wait over the next month.

I did some quick analysis over the latest death data (see image below):


Here’s the spreadsheet with the data and math.

The death count and cost of each day we don’t drop the hammer and implement a complete national lockdown need to be reported and acted on before it’s too late …

I do hope I’m wrong (for ex. Italy’s and US’s daily deaths stabilize soon) and/or we can course correct ASAP to save lives.

Betting on UFC Fights – A Statistical Data Analysis

Mixed Martial Arts (MMA) is an incredibly entertaining and technical sport to watch. It’s become one of the fastest growing sports in the world. I’ve been following MMA organizations like the Ultimate Fighting Championship (UFC) for almost eight years now, and in that time have developed a great appreciation for MMA techniques. After watching dozens of fights, you begin to pick up on what moves win and when, and spot strengths and weaknesses in certain fighters. However, I’ve always wanted to test my knowledge against the actual stats – like do accomplished wrestlers really beat fighters with little wrestling experience?

To do this, we need fight data, so I crawled and parsed all the MMA fights from This data includes fighter profiles (birth date, weight, height, disciplines, training camp, location) and fight records (challenger, opponent, time, round, outcome, event). After some basic data cleaning, I had a dataset of 11,886 fight records, 1,390 of which correspond to the UFC.

I then trained a random forest classifier from this data to see if a state-of-the-art machine learning model can identify any winning and losing characteristics. Over cross-validation with 10 folds, the resulting model scored a surprisingly decent AUC score of 0.69; a AUC score closer to 0.5 would indicate that the model can’t predict winning fights any better than random or fair coin flips.

So there may be interesting patterns in this data … Feeling motivated, I ran exhaustive searches over the data to find feature combinations that indicate winning or losing behaviors. Many hours later, several dozens of such insights were found.

Here are the most interesting ones (stars indicate statistical significance at the 5% level):

Top UFC Insights

Fighters older than 32 years of age will more likely lose

This was validated in 173 out of 277 (62%) fights*

Fighters with more than 6 TKO victories fighting opponents older than 32 years of age will more likely win

This was validated in 47 out of 60 (78%) fights*

Fighters from Japan will more likely lose

This was validated in 36 out of 51 (71%) fights*

Fighters who have lost 2 or more KOs will more likely lose

This was validated in 54 out of 84 (64%) fights*

Fighters with 3x or more decision wins and are greater than 3% taller than their opponents will more likely win

This was validated in 32 out of 38 (84%) fights*

Fighters who have won 3x or more decisions than their opponent will more likely win

This was validated in 142 out of 235 (60%) fights*

Fighters with no wrestling background vs fighters who do have one more likely lose

This was validated in 136 out of 212 (64%) fights*

Fighters fighting opponents with 3x or less decision wins and are on a 6 fight (or better) winning streak more likely win

This was validated in 30 out of 39 (77%) fights*

Fighters younger than their opponents by 3 or more years in age will more likely win

This was validated in 324 out of 556 (58%) fights*

Fighters who haven’t fought in more than 210 days will more likely lose

This was validated in 162 out of 276 (59%) fights*

Fighters taller than their opponents by 3% will more likely win

This was validated in 159 out of 274 (58%) fights*

Fighters who have lost less by submission than their opponents will more likely win

This was validated in 295 out of 522 (57%) fights*

Fighters who have lost 6 or more fights will more likely lose

This was validated in 172 out of 291 (60%) fights*

Fighters who have 18 or more wins and never had a 2 fight losing streak more likely win

This was validated in 79 out of 126 (63%) fights*

Fighters who have lost back to back fights will more likely lose

This was validated in 514 out of 906 (57%) fights*

Fighters with 0 TKO victories will more likely lose

This was validated in 90 out of 164 (55%) fights

Fighters fighting opponents out of Greg Jackson’s camp will more likely lose

This was validated in 38 out of 63 (60%) fights


Top Insights over All Fights

Fighters with 15 or more wins that have 50% less losses than their opponents will more likely win

This was validated in 239 out of 307 (78%) fights*

Fighters fighting American opponents will more likely win

This was validated in 803 out of 1303 (62%) fights*

Fighters with 2x more (or better) wins than their opponents and those opponents lost their last fights will more likely win

This was validated in 709 out of 1049 (68%) fights*

Fighters who’ve lost their last 4 fights in a row will more likely lose

This was validated in 345 out of 501 (68%) fights*

Fighters currently on a 5 fight (or better) winning streak will more likely win

This was validated in 1797 out of 2960 (61%) fights*

Fighters with 3x or more wins than their opponents will more likely win

This was validated in 2831 out of 4764 (59%) fights*

Fighters who have lost 7 or more times will more likely lose

This was validated in 2551 out of 4547 (56%) fights*

Fighters with no jiu jitsu in their background versus fighters who do have it more likely lose

This was validated in 334 out of 568 (59%) fights*

Fighters who have lost by submission 5 or more times will more likely lose

This was validated in 1166 out of 1982 (59%) fights*

Fighters in the Middleweight division who fought their last fight more recently will more likely win

This was validated in 272 out of 446 (61%) fights*

Fighters in the Lightweight division fighting 6 foot tall fighters (or higher) will more likely win

This was validated in 50 out of 83 (60%) fights


Note – I separated UFC fights from all fights because regulations and rules can vary across MMA organizations.

Most of these insights are intuitive except for maybe the last one and an earlier one which states 77% of the time fighters beat opponents who are on 6 fight or better winning streaks but have 3x less decision wins.

Many of these insights demonstrate statistically significant winning biases. I couldn’t help but wonder – could we use these insights to effectively bet on UFC fights? For the sake of simplicity, what happens if we make bets based on just the very first insight which states that fighters older than 32 years old will more likely lose (with a 62% chance)?

To evaluate this betting rule, I pulled the most recent UFC fights where in each fight there’s a fighter that’s at least 33 years old. I found 52 such fights, spanning 2/5/2011 – 8/14/2011. I placed a $10K bet on the younger fighter in each of these fights.

Surprisingly, this rule calls 33 of these 52 fights correctly (63% – very close to the rule’s observed 62% overall win rate). Each fight called incorrectly results in a loss of $10,000, and for each of the fights called correctly I obtained the corresponding Bodog money line (betting odds) to compute the actual winning amount.

I’ve compiled the betting data for these fights in this Google spreadsheet.

Note, for 6 of the fights that our rule called correctly, the money lines favored the losing fighters.

Let’s compute the overall return of our simple betting rule:

For each of these 52 fights, we risked $10,000, or in all $520,000
We lost 19 times, or a total of $190,000
Based on the betting odds of the 33 fights we called correctly (see spreadsheet), we won $255,565.44
Profit = $255,565.44 – $190,000 = $65,565.44
Return on investment (ROI) = 100 * 65,565.44 / 520,000 = 12.6%


That’s a very decent return.

For kicks, let’s compare this to investing in the stock market over the same period of time. If we buy the S&P 500 with a conventional dollar cost averaging strategy to spread out the $520,000 investment, then we get a ROI of -7.31%. Ouch.

Keep in mind that we’re using a simple betting rule that’s based on a single insight. The random forest model, which optimizes over many insights, should predict better and be applicable to more fights.

Please note that I’m just poking fun at stocks – I’m not saying betting on UFC fights with this rule is a more sound investment strategy (risk should be thoroughly examined – the variance of the performance of the rule should be evaluated over many periods of time).

The main goal here is to demonstrate the effectiveness of data driven approaches for better understanding the patterns in a sport like MMA. The UFC could leverage these data mining approaches for coming up with fairer matches (dismiss fights that match obvious winning and losing biases). I don’t favor this, but given many fans want to see knockouts, the UFC could even use these approaches to design fights that will likely avoid decisions or submissions.

Anyways, there’s so much more analysis I’ve done (and haven’t done) over this data. Will post more results when cycles permit. Stay tuned.

Ranking High Schools Based On Outcomes

High school is arguably the most important phase of your education. Some families will move just to be in the district of the best ranked high school in the area. However, the factors that these rankings are based on, such as test scores, tuition amount, average class size, teacher to student ratio, location, etc. do not measure key outcomes such as what colleges or jobs the students get into.

Unfortunately, measuring outcomes is tough – there’s no data source that I know of that describes how all past high school students ended up. However, I thought it would be a fun experiment to approximate using LinkedIn data. I took eight top high schools in the Bay Area (see the table below) and ran a whole bunch of advanced LinkedIn search queries to find graduates from these high schools while also counting up their key outcomes like what colleges they graduated from, what companies they went on to work for, what industries are they in, what job titles have they earned, etc.

The results are quite interesting. Here are a few statistics:

College Statistics

  • The top 5 high schools that have the largest share of users going to top private schools (Ivy League’s + Stanford + Caltech + MIT) are (1) Harker (2) Gunn (3) Saratoga (4) Lynbrook (5) Bellarmine.
  • The top 5 high schools that have the largest share of users going to the top 3 UC’s (Berkeley, LA, San Diego) are (1) Mission (2) Gunn (3) Saratoga (4) Lynbrook (5) Leland.
  • Although Harker has the highest share of users going to top privates (30%), their share of users going to the top UC’s is below average. It’s worth nothing that Harker’s tuition is the highest at $36K a year.
  • Bellarmine, an all men’s high school with tuition of $15K a year, is below average in its share of users going on to top private universities as well as to the UC system.
  • Gunn has the highest share of users (11%) going on to Stanford. That’s more than 2x the second place high school (Harker).
  • Mission has the highest share of users (31%) going to the top 3 UC’s and to UC Berkeley alone (14%).

Career Statistics

  • In rank order (1) Saratoga (2) Bellarmine (3) Leland have the biggest share of users which hold job titles that allude to leadership positions (CEO, VP, Manager, etc.).
  • The highest share of lawyers come from (1) Bellarmine (2) Lynbrook (3) Leland. Gunn has 0 lawyers and Harker is second lowest at 6%.
  • Saratoga has the best overall balance of users in each industry (median share of users).
  • Hardware is fading – 5 schools (Leland, Gunn,  Harker, Mission, Lynbrook) have zero users in this industry.
  • Harker has the highest share of its users in the Internet, Financial, and Medical industries.
  • Harker has the lowest percentage of Engineers and below average share of users in the Software industry.
  • Gunn has the highest share of users in the Software and Media industries.
  • Harker high school is relatively new (formed in 1998), so its graduates are still early in the workforce. Leadership takes time to earn, so the leadership statistic is unfairly biased against Harker.

You can see all the stats I collected in the table below. Keep in mind that percentages correspond to the share of users from the high school that match that column’s criteria. Yellow highlights correspond to the best score; blue shaded boxes correspond to scores that are above average. There are quite a few caveats which I’ll note in more detail later, so take these results with a grain of salt. However, as someone who grew up in the Bay Area his whole life, I will say that many of these results make sense to me.

pplmatch – Find Like Minded People on LinkedIn

Just provide a link to a public LinkedIn profile and an email address and that’s it. The system will go find other folks on LinkedIn who best match that given profile and email back a summary of the results.

It leverages some very useful IR techniques along with a basic machine learned model to optimize the matching quality.

Some use cases:

  • If I provide a link to a star engineer, I can find a bunch of folks like that person to go try to recruit. One could also use LinkedIn / Google search to find people, but sometimes it can be difficult to formulate the right query and may be easier to just pivot off an ideal candidate.
  • I recently shared it with a colleague of mine who just graduated from college. He really wants to join a startup but doesn’t know of any (he just knows about the big companies like Microsoft, Google, Yahoo!, etc.). With this tool he found people who shared similar backgrounds and saw which small companies they work at.
  • Generally browsing the people graph based on credentials as opposed to relationships. It seems to be a fun way to find like minded people around the world and see where they ended up. I’ve recently been using it to find advisors and customers based on folks I admire.

Anyways, just a fun application I developed on the side. It’s not perfect by any means but I figured it’s worth sharing.

It’s pretty compute intensive, so if you want to try it send mail to [contact at pplmatch dot com] to get your email address added to the list. Also, do make sure that the profiles you supply expose lots of text publicly – the more text the better the results.

SDSS Skyserver Traffic

This past summer I worked at MSR alongside Dr. Jim Gray on analyzing the Skyserver’s (the online worldwide telescope portal) web and SQL logs. We just published our findings, which you can access here (MSR) or here (updated).

Still needs some clean-up (spelling, grammar, flow) and additional sections to tie up some loose ends, but it’s definitely presentable. Would love to hear what you guys think about the results (besides how pretty the graphs look :).

Are professors too paper-happy?

[Update (4/26/2006): Motivation]

I've received several emails/comments (mostly from researchers and professors) regarding this post and realized some may have misconstrued my intentions for writing this article. I don't blame them. The title sounds pretty controversial and this post is quite long – compelling readers to skim and consequently miss some of the points I'm trying to make. As I mention in the second paragraph, 'I'm a huge fan of publications. The more the better.' All I'm trying to do here is make the case for why I think researchers should ALSO use more informal avenues of communication, such as blogs, for getting their ideas out for public review/commenting. These sources would not only increase reach/access, but serve as great supplementary material to their corresponding conference papers – NOT as replacements or alternatives to publications. I received a very insightful idea from Hari Balakrishnan – What if authors, in conjunction with their papers, release informal 5-6 page summaries of their projects/ideas for online public review/commenting? I would love to have resources like these available to me when dissecting the insight/knowledge encapsulated in formalized papers. I think simple ideas like these would significantly improve reachability/access of our research and even encourage more creativity/questioning.

[End Update]

On Digg:

I recently read through a professor's CV, which under the 'Publications' section stated "… published over 100 conference papers." I then proceeded to the next page of the resume to scan through some of his paper titles, only to see a brand new section titled 'References'. Wait, where'd his publications go? I went back to see if I missed a page. Nope. Wow … why wouldn't he include his citations or mention notable pieces of work? I mean, saying you have 100+ publications gives me no value unless you're going to list them. Granted, that's an amazing accomplishment – writing a paper that advances (cs) research is not only time consuming and hard work, but it requires a whole new level of creativity. Additionally, conferences are getting super competitive (unless he's publishing in these conferences) – I hear many cases of famous, tenured professors/researchers getting their papers rejected. Now multiplying this paper process by 100 represents a significant amount of research, so I'm willing to bet this guy is pretty bright. However, this one line publication description gives me the impression that this professor aims for quantity in order to amplify his prestige. I also get this feeling after publishing the 100th paper he reached one of those hard set goals that gives one the sense of 'Mission Accomplished'. I guess I'm different – I'm all about quality. Personally, I'd be content with 2 publications in my life if one of 'em got me the Turing Award and the other a date with Heidi Klum – but that's just me 🙂

Interestingly (or oddly), this CV publication description also got me thinking about some of the things I hate about (cs) papers. But first, let me make it very clear – I am a huge fan of conference publications. The more the better. The peer review process of research papers is absolutely critical to advancing science. In this post, however, I would like to make the case for why I think all researchers should ALSO use more informal avenues of communication (such as blogs, wikipedia, etc.) for publicly broadcasting their ideas and results of their latest research.

So let's start this off with a laundry list of gripes I have about 'most' papers (with possible alternatives/solutions mixed in):

PDF/PS/Sealed/doc/dvi/tex formats

  • That load time for the viewer is killer – probably deters many from even clicking the link
  • Certain viewers render diagams/text differently
  • The document looks formal, intimidating, elitist, old, plain, hardcore, not happy to see you
  • Also makes documents feel permanent, finalized, not amenable to changes
  • Provides no interface for commenting by readers – and in many cases I find reader critiques more interesting/useful than the actual paper
  • And why can't we see the feedback/comments from the conference?
  • Not pretty – It's a lot of material to read, so why not make the presentation look happier? I seriously think a nice stylesheet/web layout for paper content would significantly improve readability and portability

It's a lot of work

  • Not only does one need to come up with a fairly original idea, but research, discuss, and analyze its empirical/theoretical implications
  • Needs support/citations
  • Papers are quite formulaic – I can easily guess the page that has the nice/pretty graphs
  • This structure imposes pretty big research burdens on the authors
  • This is a GOOD thing for paper quality
  • But a terrible method for prototyping (similar to how UI dev's quickly prototype designs to cycle in user feedback)
  • Doesn't provide professors/researchers a forum to quickly get ideas out nor to filter in comments
  • Also prevents researchers from spewing their thoughts out since they wish to formalize everything in papers
  • Now there are other journals/magazines which are informal to let researchers publish their brainstorming/wacky idea articles
  • But there's still a time delay in getting those articles published
  • They still have writing standards and an approval process
  • And I don't read them (Where do I find good informal CS articles online? Is anyone linking to these? Who's authoring them?)
  • In blogs, authors can be comedic, speak freely ("he's full of it", "that technique is a load of crap" – opinions like these embedded in technical articles would making reading so much more enjoyable), and quickly get to the point without having to exhaust the boring implementation details


  • Although papers normally include author emails, this means feedback is kept privately in the authors' inboxes, not viewable to the general public
  • Many papers/journals require subscriptions/fees


  • The main issue here is professors/researchers want to publish in prestigious papers/journals
  • Rather than waste their time with things like weblogs, which are perceived to be inherently biased and non-factual, and where the audience may seem 'dumber'
  • It's in their best interest to focus on publications – get tenure, fame, approval from experts in the field

I want informality

  • But it sucks for us ('us' being the general audience who may wish to learn about these breakthroughs)
  • I love hearing professors ramble off their ideas freely
  • I want to see commentary from readers and myself ON these articles
  • And I want to see these ideas get posted and updated quickly
  • I want to see these experts explaining their ideas in simple terms (whenever possible, if it's too hardcore then it's probably not very good bloggable material) and describe the real world impacts/examples
  • But unfortunately NONE of my favorite professors/researchers publish their ideas on blogs

Popularity and Reach

  • There isn't much linkability happening for papers (besides from class/professor web sites)
  • You don't see conference papers getting linked on Slashdot/Digg
  • If millions of geeks don't even want to link/read this stuff, how and why would others? (refer below to the concluding section for why I think this is important)
  • Papers are formal, old-school, and designed to be read by expert audiences
  • They are written to impress, be exhaustive/detailed, when in reality most of us are lazy, want to read something entertaining, and get to the dang point
  • Wouldn't it be nice if professors/researchers expressed their ideas/research in a bloggish manner whenever possible?
  • At best a blog post would be a great supplemental/introductorial reading to the actual paper
  • Even the conference panel can check the blog comments to see if they missed anything
  • Some professors express their ideas informally with lecture notes and powerpoint presentations
  • But again, these formats don't let others annotate it with their comments
  • They are mostly pdf/ps/ppt/doc's (ah that load time)
  • And lecture notes usually exist for core concepts, not experimental/recent ideas


  • Which brings me up another interesting idea
  • Where's THE place to learn about core, fundamental cs topics?
  • Or even slightly maturing cs algorithms/techniques?
  • I tend to follow this learning process:
    • Check Wikipedia
    • At best gives me a great introduction to a topic
    • If i need more, I use what I learned from wikipedia/other sources to build queries for searching lecture notes/presentations
    • Typically the interesting details are hidden in a set of lecture notes/papers/books
    • And which of these sources should I use? – I have to read many versions coming from different professors to piece together a clear story
  • Wouldn't it be nice to have a one stop shop
  • Wikipedia!
  • What if Universities MOTIVATED professors/researchers to collaborate on wikipedia to publish in-depth articles regarding their areas of interest?
  • This would be huge
  • Wikipedia is incredibly popular/useful/successful, so let's use it as the place where professors around the world can unite their knowledge
  • Would be the best supplement (even replacement) for lecture notes
  • And for more experimental/controversial topics, researchers can use individual blogs

A slight digression and concluding remarks: The Future of Computer Science Research

Two things:

  1. I strongly believe the future of (computer) science relies on making our stuff more accessible to others – requiring us to tap formal AND informal avenues of communication
  2. We need to be more interdisciplinary!

Many people (especially prospective computer science students) ask me:

What's left to invent?

  • Most research to them (and me) sounds like 'optimizations' based work
  • Is there anything revolutionary left to do? Has it all been accomplished?
  • This is a hard question to answer convincingly, especially since if there was something revolutionary left to do, someone (hopefully me) would be on it
  • It's also difficult to forsee the impacts of ongoing research
  • Big ideas/inventions are evolutionary, piecing together decades of work.

They even say … "if only I could go back in time and invent the TV, PC, Internet, Web Search …"

  • As if back in the day we were living in the stone age and there was so many things left to do
  • There are even more things left to do today!

Our goal should not be to come up with an idea that surpasses the importance of say the invention of the PC

  • (I'm not even sure if this is possible, and at best is an ill comparison to make)
  • But rather to learn about the problems people face NOW and use our knowledge of science to solve them
  • Research should be entrepreneurial: Find a customer pain and provide a method to solve it
  • Not the other way around: Playing with super geeky stuff for fun and hoping later it might have applications (there are exceptions to this, since it encourages depth/exploration of a field which may lead to an accidental discovery of something huge, but I still think for the most part this is the wrong way to go about research)
  • Your audience is the most important element of research
  • And the beauty of our field – IT is a common theme in every research area!
  • We're the common denominator!
  • We can go about improving research in any subject

Now getting back to focus …

  • We consider the PC, Internet, etc. to be revolutionary because they are intuitive, fundamental platforms
  • Current research adds many layers of complexity to these ideas in order to solve more difficult problems
  • Making things more and more complex
  • What we need to be doing better is making our complex systems easier to use
  • We need better integration
  • And expand our research applications into other fields, rather than just reinforcing them back into computer science areas
  • We need to be interdisciplinary!
  • We have yet to even touch the surface when it comes to exploring the overlaps of fields

Some of the things we've been brewing since the 1970's could seriously REVOLUTIONIZE other industries

  • Imagine machine learning a database filled with all past medical histories/records/diagnostic reports
    • So when presented with symptoms/readings of a new patient, the system will tell you what that patient is suffering from with high probability based off analyzing millions of records
    • This would dramatically decrease the number of cases of death due to bad diagnosis and lower medical/insurance costs since it wouldn't be necessary to run a bunch of expensive scans/tests and surgeries to figure out what's wrong with a patient
  • A similar system could do the same thing over economics, disease, environmental, hazards datasets so that scientists and policy makers can ask questions like 'In the next five years what region of the world has the highest probability of a natural disaster … or a disease epidemic?, etc.'
    • This would be huge in shaping policy priorities, saving human lives, and preparing/preventing disasters like Katrina from ever escalating
  • Or what about mobile ad hoc networks to let villages in the poorest regions of the world wirelessly share an internet connection
  • Or sensor networks to do environmental sensing, health monitoring in hospitals, traffic control, pinpointing the location of a crime, etc.
  • And plenty, plenty more

The attractive elements of computer science research is not necessarily the algorithms, nor the programming implementation

  • But rather the intelligence/knowledge it can provide to humans after partying on data
  • Information is what makes a killer application!
  • And every field has tons of interesting information
  • We need to INTERACT and learn more about the needs in other fields and shape our research accordingly
  • (Unfortunately, computer scientists have a somewhat bad reputation when it comes to any form of 'interaction')

So after hearing all this, the prospective science students complain:

  • "Now we have to learn about multiple fields – our research burden is so much bigger than the scientists' in the past"
  • This is a very silly argument for defending one's laziness to pursue (computer) science
  • I mean I much rather hear one of them 'I don't want to get outsourced to India' excuses 🙂
  • Here's why:
    • We got it GOOD
    • I mean, geez, we got Google!
    • I can't even begin to imagine what Einstein, Gauss, Turing, Newton, etc. could have accomplished if they had the information research tools we got
    • We could have flying cars, cleaner environment, robots doing our homework
    • Heck, they could have probably of found a way to achieve world peace
    • We take what we have for granted
  • Back in the day, where there was no email, no search, no unified, indexed repository of publications
  • Scientists had to do amazing amounts of research just to find the right sources to learn from
  • Travel around the world to meet with leading researchers
  • Read entire books to find relevant information
  • You see, scientists back then had no choice but to be one-dimensional, and those who went deep into different fields either had no lives or were incredibly brilliant (most likely both)
  • But we can do all this work they had to do just to prepare for learning in a matter of seconds, from anywhere!
  • It's utter crap to think we can't learn about multiple fields of interest, in-depth
  • And our research tools will only get better and better exponentially faster
  • But I don't blame these students for their complaint
  • We're still trained to be one-dimensional
  • Universities have specific majors and graduate programs (although some of these programs are SLOWLY mixing together)
  • Thereby requiring many high school students to select a single major on the admissions application
  • Even though their high school (which focuses on breadth) probably did a terrible job introducing them to liberal arts and science
  • [Side Note: Actually, this is one of things I never quite understood. How do students choose a major like EE/NuclearEng/MechEng on the admissions application, when their high school most likely doesn't offer any of those courses? Sounds like their decision came down to three factors (1) The student is a geek (2) Parent's influence (3) Money. 2/3 of these are terrible reasons to enter these fields. Even worse, many universities separate engineering from liberal arts, making it almost impossible for a student who originally declared 'Economics' to switch to BioE after taking/liking a biotech class – despite the fact that their motivation to enter the major is so much better. You should enter a field you love, and making the decision on which students can enter engineering majors based off high school, which gives you almost zero exposure to these fields, makes no sense. And we wonder why we have a shortage of good engineers … well one reason is probably because we don't let them into the university programs – 'them' being the ones who actually realized they liked this stuff when they got to college.]
  • Many job descriptions want students/applicants with skills pertaining to one area

But unfortunately the problems we now face require us to understand how things mix together

  • This is why it's very important we increase the levels communications between fields to promote interdisciplinary research
  • Like a simple example: Databases and IR
    • Both have the same goal: To answer queries accurately regarding a set of data
    • Despite this, they are seen as two traditionally separate areas
    • Database/IR people normally work on different floors/buildings at companies/universities
    • One system is based traditionally more on logic while the other is more statistical/probablistic
    • But exploiting their overlap could greatly improve information access (a la SQL Server 2005)
    • Notice this example refers to two subtopics under just computer science, not even two mainstream fields!
  • Just imagine the impacts of collaborating more with other fields of study

This is why I feel blogging, wikipedia, and informal avenues of communicating our thoughts/research/ideas is very important

  • So that people in other fields can read them
  • And for the laundry list of reasons above
  • Even if it's just a proof of concept technique with no empirical studies
  • An interested geek reader who came across the blog post might just end up coding it up for you
  • Could even inspire start-up companies

So the POINT: Do whatever you can to voice your ideas/research to the largest possible audience in the hopes of cross fertilizing research in many fields.

* So what's up with the parentheses around 'cs' and 'computer' in this post? Well, many of these points generalize to all (science) research publications 🙂

This work is licensed under a Creative Commons License