Category Archives: Economics

Amazon, Google and Apple vs the Big 5 Unicorns on Hiring and Churn

I’ve received multiple requests to analyze employee churn and new hiring rates for big companies and unicorns with the approach I took earlier for studying engineering and sales retention rates. I figured I’d give it a shot – and combine all of the key metrics in one chart …

amazon_google_apple_unicorns_retention_hiring

How to read this:

Blue bar represents the number of expected new hires that particular company will make in a 30 day (one month) period. Black bars (negative values) indicate how many employees will churn in a one month period. The orange line (the top most numerical labels correspond to the orange line plot) represent the net change in hires per month (new hires less churn). The companies are ranked by churn from left to right in descending order (so highest churn on the left).

As you can see in the chart, the big three companies included in this analysis are Amazon, Apple and Google. The unicorns are Uber, Lyft, Airbnb, Pinterest and Snapchat. “Big 5” combines these unicorns together as if they were one whole company. Also note, this is looking at employees worldwide with any job title.

Key Insights:

  1. Apple is not hiring enough new heads when compared with Amazon and Google. In fact, the Big 5 unicorns combined will hire more net heads than Apple with almost 50% less employee churn.
  2. Amazon’s churn is the highest – losing a little over 10 people a day. However, this is not bad relatively speaking – Google loses 8-9 people a day, and Apple is a tad over 9 a day (and Amazon has 36% more employees than Google). Given the recent press bashing Amazon’s culture and the periodic press envying Google’s great benefits, their retention rates tell a different a story – that it’s closer to a wash. Big tech companies with great talent churn people at pretty similar high rates regardless it seems (have some more thoughts on this but will save those for another post).
  3. At these current rates, all of the companies here (collectively) will increase their employee size by 20K (19, 414 to be precise) heads by year end (this is new hires less churn). That’s a measly 5% increase in their current collective employee size – and this is across the Big 3 Tech Companies and Big 5 Unicorns.
  4. Let’s compare Amazon to the Big 5 Unicorns. The Big 5 will hire 79% as many incremental heads as Amazon in a month, even though their collective employee size is 24% that of Amazon’s. Amazon has been in business for much longer (2-3x days since incorporation), and the Big 5’s churn is 43% of Amazon’s figure – both factors contributing to the closeness in the incremental head rate between the two.

Want more details?

How did I calculate these figures? Take a look at my previous post on engineering retention for more details. Same caveats listed there apply here, and then some (such as how this depends on the participation rates of LinkedIn which may differ considerably internationally compared to the US market which my previous posts exclusively focused on). Feel free to connect or email me if you have any questions or feedback.

 

Leave a comment

Filed under Data Mining, Economics, Entrepreneurship, Google, LinkedIn, Management, Non-Technical-Read, Research, Startups, Statistics, Trends

Ranking Companies on Sales Culture & Retention

A company’s sales retention rate is a very important indicator of business health. If you have a good gauge on this, you could better answer questions such as: should I join that company’s sales department, will I be able to progress up the ladder, are reps hitting their numbers, are they providing effective training, should I invest money in this business, etc. But how does one measure this rate especially from an outside vantage point? This is where LinkedIn comes to the rescue. I essentially cross applied the approach I took to measuring engineering retention to sales.

sales_ret_2

This chart reveals several key technology companies ranked in reverse order of sales churn – so higher on the chart (or longer the bar) the higher the churn (so from worst at the top to best at the bottom).

So how are we defining sales churn here? I calculated the measurement as follows: I took the number of people who have ever churned in a sales role from the company and divide that by the number of days since incorporation for that respective company (call this Churn Per Day), and then I compute the ratio of how many sales people will churn in one year (the run rate i.e. Churn Per Day * 365) over the number of current sales people employed.

For ex. if you look at the top row, which is Zenefits, the value is 0.40 – which means that 40% of the current sales team size will churn in a one year period. In order to maintain that sales team size and corresponding revenue, the company will need to hire 40% of their team – and sooner than in a year as that churn likely spreads throughout the year as well as given new sales hire ramping periods (if you’re churning a ramped rep and say it takes one quarter to ramp a new sales rep, then you need to hire a new head at least one quarter beforehand to avoid a revenue dip).

A few more notes:

The color saturation indicates Churn Per Day – the darker the color, the higher the Churn Per Day.

Caveats listed in the previous post on engineering retention apply to this analysis too.

Leave a comment

Filed under Data Mining, Economics, Enterprise, Entrepreneurship, Job Stuff, LinkedIn, Non-Technical-Read, Startups, Statistics, Trends, Venture Capital

Betting on UFC Fights – A Statistical Data Analysis

Mixed Martial Arts (MMA) is an incredibly entertaining and technical sport to watch. It’s become one of the fastest growing sports in the world. I’ve been following MMA organizations like the Ultimate Fighting Championship (UFC) for almost eight years now, and in that time have developed a great appreciation for MMA techniques. After watching dozens of fights, you begin to pick up on what moves win and when, and spot strengths and weaknesses in certain fighters. However, I’ve always wanted to test my knowledge against the actual stats – like do accomplished wrestlers really beat fighters with little wrestling experience?

To do this, we need fight data, so I crawled and parsed all the MMA fights from Sherdog.com. This data includes fighter profiles (birth date, weight, height, disciplines, training camp, location) and fight records (challenger, opponent, time, round, outcome, event). After some basic data cleaning, I had a dataset of 11,886 fight records, 1,390 of which correspond to the UFC.

I then trained a random forest classifier from this data to see if a state-of-the-art machine learning model can identify any winning and losing characteristics. Over cross-validation with 10 folds, the resulting model scored a surprisingly decent AUC score of 0.69; a AUC score closer to 0.5 would indicate that the model can’t predict winning fights any better than random or fair coin flips.

So there may be interesting patterns in this data … Feeling motivated, I ran exhaustive searches over the data to find feature combinations that indicate winning or losing behaviors. Many hours later, several dozens of such insights were found.

Here are the most interesting ones (stars indicate statistical significance at the 5% level):

Top UFC Insights

Fighters older than 32 years of age will more likely lose

This was validated in 173 out of 277 (62%) fights*

Fighters with more than 6 TKO victories fighting opponents older than 32 years of age will more likely win

This was validated in 47 out of 60 (78%) fights*

Fighters from Japan will more likely lose

This was validated in 36 out of 51 (71%) fights*

Fighters who have lost 2 or more KOs will more likely lose

This was validated in 54 out of 84 (64%) fights*

Fighters with 3x or more decision wins and are greater than 3% taller than their opponents will more likely win

This was validated in 32 out of 38 (84%) fights*

Fighters who have won 3x or more decisions than their opponent will more likely win

This was validated in 142 out of 235 (60%) fights*

Fighters with no wrestling background vs fighters who do have one more likely lose

This was validated in 136 out of 212 (64%) fights*

Fighters fighting opponents with 3x or less decision wins and are on a 6 fight (or better) winning streak more likely win

This was validated in 30 out of 39 (77%) fights*

Fighters younger than their opponents by 3 or more years in age will more likely win

This was validated in 324 out of 556 (58%) fights*

Fighters who haven’t fought in more than 210 days will more likely lose

This was validated in 162 out of 276 (59%) fights*

Fighters taller than their opponents by 3% will more likely win

This was validated in 159 out of 274 (58%) fights*

Fighters who have lost less by submission than their opponents will more likely win

This was validated in 295 out of 522 (57%) fights*

Fighters who have lost 6 or more fights will more likely lose

This was validated in 172 out of 291 (60%) fights*

Fighters who have 18 or more wins and never had a 2 fight losing streak more likely win

This was validated in 79 out of 126 (63%) fights*

Fighters who have lost back to back fights will more likely lose

This was validated in 514 out of 906 (57%) fights*

Fighters with 0 TKO victories will more likely lose

This was validated in 90 out of 164 (55%) fights

Fighters fighting opponents out of Greg Jackson’s camp will more likely lose

This was validated in 38 out of 63 (60%) fights

 

Top Insights over All Fights

Fighters with 15 or more wins that have 50% less losses than their opponents will more likely win

This was validated in 239 out of 307 (78%) fights*

Fighters fighting American opponents will more likely win

This was validated in 803 out of 1303 (62%) fights*

Fighters with 2x more (or better) wins than their opponents and those opponents lost their last fights will more likely win

This was validated in 709 out of 1049 (68%) fights*

Fighters who’ve lost their last 4 fights in a row will more likely lose

This was validated in 345 out of 501 (68%) fights*

Fighters currently on a 5 fight (or better) winning streak will more likely win

This was validated in 1797 out of 2960 (61%) fights*

Fighters with 3x or more wins than their opponents will more likely win

This was validated in 2831 out of 4764 (59%) fights*

Fighters who have lost 7 or more times will more likely lose

This was validated in 2551 out of 4547 (56%) fights*

Fighters with no jiu jitsu in their background versus fighters who do have it more likely lose

This was validated in 334 out of 568 (59%) fights*

Fighters who have lost by submission 5 or more times will more likely lose

This was validated in 1166 out of 1982 (59%) fights*

Fighters in the Middleweight division who fought their last fight more recently will more likely win

This was validated in 272 out of 446 (61%) fights*

Fighters in the Lightweight division fighting 6 foot tall fighters (or higher) will more likely win

This was validated in 50 out of 83 (60%) fights

 

Note – I separated UFC fights from all fights because regulations and rules can vary across MMA organizations.

Most of these insights are intuitive except for maybe the last one and an earlier one which states 77% of the time fighters beat opponents who are on 6 fight or better winning streaks but have 3x less decision wins.

Many of these insights demonstrate statistically significant winning biases. I couldn’t help but wonder – could we use these insights to effectively bet on UFC fights? For the sake of simplicity, what happens if we make bets based on just the very first insight which states that fighters older than 32 years old will more likely lose (with a 62% chance)?

To evaluate this betting rule, I pulled the most recent UFC fights where in each fight there’s a fighter that’s at least 33 years old. I found 52 such fights, spanning 2/5/2011 – 8/14/2011. I placed a $10K bet on the younger fighter in each of these fights.

Surprisingly, this rule calls 33 of these 52 fights correctly (63% – very close to the rule’s observed 62% overall win rate). Each fight called incorrectly results in a loss of $10,000, and for each of the fights called correctly I obtained the corresponding Bodog money line (betting odds) to compute the actual winning amount.

I’ve compiled the betting data for these fights in this Google spreadsheet.

Note, for 6 of the fights that our rule called correctly, the money lines favored the losing fighters.

Let’s compute the overall return of our simple betting rule:

For each of these 52 fights, we risked $10,000, or in all $520,000
We lost 19 times, or a total of $190,000
Based on the betting odds of the 33 fights we called correctly (see spreadsheet), we won $255,565.44
Profit = $255,565.44 – $190,000 = $65,565.44
Return on investment (ROI) = 100 * 65,565.44 / 520,000 = 12.6%

 

That’s a very decent return.

For kicks, let’s compare this to investing in the stock market over the same period of time. If we buy the S&P 500 with a conventional dollar cost averaging strategy to spread out the $520,000 investment, then we get a ROI of -7.31%. Ouch.

Keep in mind that we’re using a simple betting rule that’s based on a single insight. The random forest model, which optimizes over many insights, should predict better and be applicable to more fights.

Please note that I’m just poking fun at stocks – I’m not saying betting on UFC fights with this rule is a more sound investment strategy (risk should be thoroughly examined – the variance of the performance of the rule should be evaluated over many periods of time).

The main goal here is to demonstrate the effectiveness of data driven approaches for better understanding the patterns in a sport like MMA. The UFC could leverage these data mining approaches for coming up with fairer matches (dismiss fights that match obvious winning and losing biases). I don’t favor this, but given many fans want to see knockouts, the UFC could even use these approaches to design fights that will likely avoid decisions or submissions.

Anyways, there’s so much more analysis I’ve done (and haven’t done) over this data. Will post more results when cycles permit. Stay tuned.

25 Comments

Filed under AI, Blog Stuff, Computer Science, Data Mining, Economics, Machine Learning, Research, Science, Statistics, Trends

Is the Facebook Application Platform Fair?

Take a look at this stats deck from O’Reilly’s Graphing Social Patterns conference:

http://en.oreilly.com/gspeast2008/public/asset/attachment/2950

Fairly in-depth and recent [6/01/2008] analysis of the application usage in Facebook and MySpace.

As expected, lots of power law behavior.

I found the slides describing churn to be pretty interesting. Since October 2007, nine of the top fifteen most popular applications are new. However, only three of those new applications debuted after March 2008. I expect the amount of churn in the top spots to continue to drop based on the recent declining active usage trends and Facebook’s efforts to curb application spam (new UI that puts applications in a separate profile tab, app module minimizing, viral friend messaging limits, security compliance, etc).

What I would find even more interesting is a study of the number of applications users install, and how those moving averages have changed over time. Like say the number of applications a typical user installs is 4. Once the user reaches that threshold, what’s the churn like then? Specifically, what are the chances that a user will add a new app? Or maybe an even better metric: how long does it take, and how does this length of time compare to when the user had 1 app and increased to 2, or 2 apps and increased to 3, etc.? Basically, what’s the adoption rate/times based on current application counts?

I believe it becomes harder to influence a user to add or replace for a new app if the number of current apps the user has is high. I think most users, without even knowing it, have a threshold of how many total apps they are willing display on their profile – and that this threshold is based on an ongoing evaluation of the utility and efficiency of the page. Each app takes up real estate on the profile page, and a “rational” user will only show so many until page load times degrade and/or core modules (wall, general information, albums, networks) get drowned in clutter and thus become difficult for users to locate. Of course, social networks like MySpace which have very minimal profile page design constraints prove that most users are irrational 😉 – but it’s this design control that greatly helped Facebook dominate the market IMO.

If this is true, then it means that first movers really, really win in the Facebook apps world. Companies like Slide and Rockyou manage many of the top applications, and given the power law market share phenomena, they control a majority stake of application usage and installs. Many of these companies had the early bird advantage, and once winners, always winners – acquisitions of emerging applications, leveraging branding and existing audiences (a.k.a monopoly) to cross promote potentially copy-cat applications faster and wider than the competition, etc. Monopolies inside Facebook have unsettling ramifications, as they block newcomers from capturing profile space. If they fail to innovate (as most monopolies) then next-gen application development may never get through.

Now, if users do have an application count threshold, and it becomes successively more difficult to replace/add a new app as this count increases, then any apps developed now have a substantially rarer chance of gaining market share. If winner’s win, first movers reap, and churn becomes improbable over time, then the early top apps have already most likely filled up users’ allocated app slots.

I find thinking of the profile page as a resource allocation problem rather fascinating. Essentially, there are finite resources on a page and we expect rational users to perform some optimization to allocate resources to maximize utility for themselves and for others (potential game theory link). Once users fill up these resources, human laziness kicks in. Another warrant for improbable churn is that users who want to add new applications after filling up their resource limit will need to remove an existing app to make space. The standards for change are higher now, as the user must compare the new app to an existing preferred app (which probably is a popular early-bird app that friends use), and so the decision will incur a trade-off.

One could also argue that with more apps available now (second slide shows that despite sluggish usage the # of app’s being developed is still growing insanely) users are burdened with more choices. Or, one could argue because most users have reached their app limit, and thus, churn has become improbable, the discoverability of new apps among friends (a critical channel for adoption) also becomes improbable.

Under this theory, especially in context of Facebook’s current efforts and app stats, the growth of new app adoption in social networks will continue to slow down.

So what can be done here?

The platform needs to encourage more churn by building a fairer market that matches users to high quality apps that satisfy their expressed intents. At the end of the day, these applications are really just web pages, but unlike the web, they do not leverage important primitives like linking and meta tags. Search engines like Google and Yahoo use these features extensively to calculate authority and relevance. In the long run, as the number of sources increases, advanced ranking algorithms and marketplaces are necessary to scale and ensure fairness to worthy tail publishers. Maybe social networks should inherit these system properties to bolster their tail applications.

Also, Facebook needs to encourage users to variate or add more applications to their profile page. Facebook’s move to put applications in its own profile tab may very well achieve this goal, but at a consequence of lowering their visibility.

Anyways, just some random thoughts about the current state of Facebook apps. It’ll be very interesting to see how their platform progresses and how it will be perceived by end users and developers in the future.

3 Comments

Filed under Economics, Facebook, Non-Technical-Read, Social, Statistics, Trends