Tag Archives: Data Mining

Ranking Companies on Sales Culture & Retention

A company’s sales retention rate is a very important indicator of business health. If you have a good gauge on this, you could better answer questions such as: should I join that company’s sales department, will I be able to progress up the ladder, are reps hitting their numbers, are they providing effective training, should I invest money in this business, etc. But how does one measure this rate especially from an outside vantage point? This is where LinkedIn comes to the rescue. I essentially cross applied the approach I took to measuring engineering retention to sales.

sales_ret_2

This chart reveals several key technology companies ranked in reverse order of sales churn – so higher on the chart (or longer the bar) the higher the churn (so from worst at the top to best at the bottom).

So how are we defining sales churn here? I calculated the measurement as follows: I took the number of people who have ever churned in a sales role from the company and divide that by the number of days since incorporation for that respective company (call this Churn Per Day), and then I compute the ratio of how many sales people will churn in one year (the run rate i.e. Churn Per Day * 365) over the number of current sales people employed.

For ex. if you look at the top row, which is Zenefits, the value is 0.40 – which means that 40% of the current sales team size will churn in a one year period. In order to maintain that sales team size and corresponding revenue, the company will need to hire 40% of their team – and sooner than in a year as that churn likely spreads throughout the year as well as given new sales hire ramping periods (if you’re churning a ramped rep and say it takes one quarter to ramp a new sales rep, then you need to hire a new head at least one quarter beforehand to avoid a revenue dip).

A few more notes:

The color saturation indicates Churn Per Day – the darker the color, the higher the Churn Per Day.

Caveats listed in the previous post on engineering retention apply to this analysis too.

Leave a comment

Filed under Data Mining, Economics, Enterprise, Entrepreneurship, Job Stuff, LinkedIn, Non-Technical-Read, Startups, Statistics, Trends, Venture Capital

Betting on UFC Fights – A Statistical Data Analysis

Mixed Martial Arts (MMA) is an incredibly entertaining and technical sport to watch. It’s become one of the fastest growing sports in the world. I’ve been following MMA organizations like the Ultimate Fighting Championship (UFC) for almost eight years now, and in that time have developed a great appreciation for MMA techniques. After watching dozens of fights, you begin to pick up on what moves win and when, and spot strengths and weaknesses in certain fighters. However, I’ve always wanted to test my knowledge against the actual stats – like do accomplished wrestlers really beat fighters with little wrestling experience?

To do this, we need fight data, so I crawled and parsed all the MMA fights from Sherdog.com. This data includes fighter profiles (birth date, weight, height, disciplines, training camp, location) and fight records (challenger, opponent, time, round, outcome, event). After some basic data cleaning, I had a dataset of 11,886 fight records, 1,390 of which correspond to the UFC.

I then trained a random forest classifier from this data to see if a state-of-the-art machine learning model can identify any winning and losing characteristics. Over cross-validation with 10 folds, the resulting model scored a surprisingly decent AUC score of 0.69; a AUC score closer to 0.5 would indicate that the model can’t predict winning fights any better than random or fair coin flips.

So there may be interesting patterns in this data … Feeling motivated, I ran exhaustive searches over the data to find feature combinations that indicate winning or losing behaviors. Many hours later, several dozens of such insights were found.

Here are the most interesting ones (stars indicate statistical significance at the 5% level):

Top UFC Insights

Fighters older than 32 years of age will more likely lose

This was validated in 173 out of 277 (62%) fights*

Fighters with more than 6 TKO victories fighting opponents older than 32 years of age will more likely win

This was validated in 47 out of 60 (78%) fights*

Fighters from Japan will more likely lose

This was validated in 36 out of 51 (71%) fights*

Fighters who have lost 2 or more KOs will more likely lose

This was validated in 54 out of 84 (64%) fights*

Fighters with 3x or more decision wins and are greater than 3% taller than their opponents will more likely win

This was validated in 32 out of 38 (84%) fights*

Fighters who have won 3x or more decisions than their opponent will more likely win

This was validated in 142 out of 235 (60%) fights*

Fighters with no wrestling background vs fighters who do have one more likely lose

This was validated in 136 out of 212 (64%) fights*

Fighters fighting opponents with 3x or less decision wins and are on a 6 fight (or better) winning streak more likely win

This was validated in 30 out of 39 (77%) fights*

Fighters younger than their opponents by 3 or more years in age will more likely win

This was validated in 324 out of 556 (58%) fights*

Fighters who haven’t fought in more than 210 days will more likely lose

This was validated in 162 out of 276 (59%) fights*

Fighters taller than their opponents by 3% will more likely win

This was validated in 159 out of 274 (58%) fights*

Fighters who have lost less by submission than their opponents will more likely win

This was validated in 295 out of 522 (57%) fights*

Fighters who have lost 6 or more fights will more likely lose

This was validated in 172 out of 291 (60%) fights*

Fighters who have 18 or more wins and never had a 2 fight losing streak more likely win

This was validated in 79 out of 126 (63%) fights*

Fighters who have lost back to back fights will more likely lose

This was validated in 514 out of 906 (57%) fights*

Fighters with 0 TKO victories will more likely lose

This was validated in 90 out of 164 (55%) fights

Fighters fighting opponents out of Greg Jackson’s camp will more likely lose

This was validated in 38 out of 63 (60%) fights

 

Top Insights over All Fights

Fighters with 15 or more wins that have 50% less losses than their opponents will more likely win

This was validated in 239 out of 307 (78%) fights*

Fighters fighting American opponents will more likely win

This was validated in 803 out of 1303 (62%) fights*

Fighters with 2x more (or better) wins than their opponents and those opponents lost their last fights will more likely win

This was validated in 709 out of 1049 (68%) fights*

Fighters who’ve lost their last 4 fights in a row will more likely lose

This was validated in 345 out of 501 (68%) fights*

Fighters currently on a 5 fight (or better) winning streak will more likely win

This was validated in 1797 out of 2960 (61%) fights*

Fighters with 3x or more wins than their opponents will more likely win

This was validated in 2831 out of 4764 (59%) fights*

Fighters who have lost 7 or more times will more likely lose

This was validated in 2551 out of 4547 (56%) fights*

Fighters with no jiu jitsu in their background versus fighters who do have it more likely lose

This was validated in 334 out of 568 (59%) fights*

Fighters who have lost by submission 5 or more times will more likely lose

This was validated in 1166 out of 1982 (59%) fights*

Fighters in the Middleweight division who fought their last fight more recently will more likely win

This was validated in 272 out of 446 (61%) fights*

Fighters in the Lightweight division fighting 6 foot tall fighters (or higher) will more likely win

This was validated in 50 out of 83 (60%) fights

 

Note – I separated UFC fights from all fights because regulations and rules can vary across MMA organizations.

Most of these insights are intuitive except for maybe the last one and an earlier one which states 77% of the time fighters beat opponents who are on 6 fight or better winning streaks but have 3x less decision wins.

Many of these insights demonstrate statistically significant winning biases. I couldn’t help but wonder – could we use these insights to effectively bet on UFC fights? For the sake of simplicity, what happens if we make bets based on just the very first insight which states that fighters older than 32 years old will more likely lose (with a 62% chance)?

To evaluate this betting rule, I pulled the most recent UFC fights where in each fight there’s a fighter that’s at least 33 years old. I found 52 such fights, spanning 2/5/2011 – 8/14/2011. I placed a $10K bet on the younger fighter in each of these fights.

Surprisingly, this rule calls 33 of these 52 fights correctly (63% – very close to the rule’s observed 62% overall win rate). Each fight called incorrectly results in a loss of $10,000, and for each of the fights called correctly I obtained the corresponding Bodog money line (betting odds) to compute the actual winning amount.

I’ve compiled the betting data for these fights in this Google spreadsheet.

Note, for 6 of the fights that our rule called correctly, the money lines favored the losing fighters.

Let’s compute the overall return of our simple betting rule:

For each of these 52 fights, we risked $10,000, or in all $520,000
We lost 19 times, or a total of $190,000
Based on the betting odds of the 33 fights we called correctly (see spreadsheet), we won $255,565.44
Profit = $255,565.44 – $190,000 = $65,565.44
Return on investment (ROI) = 100 * 65,565.44 / 520,000 = 12.6%

 

That’s a very decent return.

For kicks, let’s compare this to investing in the stock market over the same period of time. If we buy the S&P 500 with a conventional dollar cost averaging strategy to spread out the $520,000 investment, then we get a ROI of -7.31%. Ouch.

Keep in mind that we’re using a simple betting rule that’s based on a single insight. The random forest model, which optimizes over many insights, should predict better and be applicable to more fights.

Please note that I’m just poking fun at stocks – I’m not saying betting on UFC fights with this rule is a more sound investment strategy (risk should be thoroughly examined – the variance of the performance of the rule should be evaluated over many periods of time).

The main goal here is to demonstrate the effectiveness of data driven approaches for better understanding the patterns in a sport like MMA. The UFC could leverage these data mining approaches for coming up with fairer matches (dismiss fights that match obvious winning and losing biases). I don’t favor this, but given many fans want to see knockouts, the UFC could even use these approaches to design fights that will likely avoid decisions or submissions.

Anyways, there’s so much more analysis I’ve done (and haven’t done) over this data. Will post more results when cycles permit. Stay tuned.

26 Comments

Filed under AI, Blog Stuff, Computer Science, Data Mining, Economics, Machine Learning, Research, Science, Statistics, Trends