Mixed Martial Arts (MMA) is an incredibly entertaining and technical sport to watch. It’s become one of the fastest growing sports in the world. I’ve been following MMA organizations like the Ultimate Fighting Championship (UFC) for almost eight years now, and in that time have developed a great appreciation for MMA techniques. After watching dozens of fights, you begin to pick up on what moves win and when, and spot strengths and weaknesses in certain fighters. However, I’ve always wanted to test my knowledge against the actual stats – like do accomplished wrestlers really beat fighters with little wrestling experience?
To do this, we need fight data, so I crawled and parsed all the MMA fights from Sherdog.com. This data includes fighter profiles (birth date, weight, height, disciplines, training camp, location) and fight records (challenger, opponent, time, round, outcome, event). After some basic data cleaning, I had a dataset of 11,886 fight records, 1,390 of which correspond to the UFC.
I then trained a random forest classifier from this data to see if a state-of-the-art machine learning model can identify any winning and losing characteristics. Over cross-validation with 10 folds, the resulting model scored a surprisingly decent AUC score of 0.69; a AUC score closer to 0.5 would indicate that the model can’t predict winning fights any better than random or fair coin flips.
So there may be interesting patterns in this data … Feeling motivated, I ran exhaustive searches over the data to find feature combinations that indicate winning or losing behaviors. Many hours later, several dozens of such insights were found.
Here are the most interesting ones (stars indicate statistical significance at the 5% level):
Top UFC Insights
Fighters older than 32 years of age will more likely lose
Fighters with more than 6 TKO victories fighting opponents older than 32 years of age will more likely win
Fighters from Japan will more likely lose
Fighters who have lost 2 or more KOs will more likely lose
Fighters with 3x or more decision wins and are greater than 3% taller than their opponents will more likely win
Fighters who have won 3x or more decisions than their opponent will more likely win
Fighters with no wrestling background vs fighters who do have one more likely lose
Fighters fighting opponents with 3x or less decision wins and are on a 6 fight (or better) winning streak more likely win
Fighters younger than their opponents by 3 or more years in age will more likely win
Fighters who haven’t fought in more than 210 days will more likely lose
Fighters taller than their opponents by 3% will more likely win
Fighters who have lost less by submission than their opponents will more likely win
Fighters who have lost 6 or more fights will more likely lose
Fighters who have 18 or more wins and never had a 2 fight losing streak more likely win
Fighters who have lost back to back fights will more likely lose
Fighters with 0 TKO victories will more likely lose
Fighters fighting opponents out of Greg Jackson’s camp will more likely lose
Top Insights over All Fights
Fighters with 15 or more wins that have 50% less losses than their opponents will more likely win
This was validated in 239 out of 307 (78%) fights*
Fighters fighting American opponents will more likely win
Fighters with 2x more (or better) wins than their opponents and those opponents lost their last fights will more likely win
Fighters who’ve lost their last 4 fights in a row will more likely lose
Fighters currently on a 5 fight (or better) winning streak will more likely win
Fighters with 3x or more wins than their opponents will more likely win
Fighters who have lost 7 or more times will more likely lose
Fighters with no jiu jitsu in their background versus fighters who do have it more likely lose
Fighters who have lost by submission 5 or more times will more likely lose
Fighters in the Middleweight division who fought their last fight more recently will more likely win
Fighters in the Lightweight division fighting 6 foot tall fighters (or higher) will more likely win
Note – I separated UFC fights from all fights because regulations and rules can vary across MMA organizations.
Most of these insights are intuitive except for maybe the last one and an earlier one which states 77% of the time fighters beat opponents who are on 6 fight or better winning streaks but have 3x less decision wins.
Many of these insights demonstrate statistically significant winning biases. I couldn’t help but wonder – could we use these insights to effectively bet on UFC fights? For the sake of simplicity, what happens if we make bets based on just the very first insight which states that fighters older than 32 years old will more likely lose (with a 62% chance)?
To evaluate this betting rule, I pulled the most recent UFC fights where in each fight there’s a fighter that’s at least 33 years old. I found 52 such fights, spanning 2/5/2011 – 8/14/2011. I placed a $10K bet on the younger fighter in each of these fights.
Surprisingly, this rule calls 33 of these 52 fights correctly (63% – very close to the rule’s observed 62% overall win rate). Each fight called incorrectly results in a loss of $10,000, and for each of the fights called correctly I obtained the corresponding Bodog money line (betting odds) to compute the actual winning amount.
I’ve compiled the betting data for these fights in this Google spreadsheet.
Note, for 6 of the fights that our rule called correctly, the money lines favored the losing fighters.
Let’s compute the overall return of our simple betting rule:
That’s a very decent return.
For kicks, let’s compare this to investing in the stock market over the same period of time. If we buy the S&P 500 with a conventional dollar cost averaging strategy to spread out the $520,000 investment, then we get a ROI of -7.31%. Ouch.
Keep in mind that we’re using a simple betting rule that’s based on a single insight. The random forest model, which optimizes over many insights, should predict better and be applicable to more fights.
Please note that I’m just poking fun at stocks – I’m not saying betting on UFC fights with this rule is a more sound investment strategy (risk should be thoroughly examined – the variance of the performance of the rule should be evaluated over many periods of time).
The main goal here is to demonstrate the effectiveness of data driven approaches for better understanding the patterns in a sport like MMA. The UFC could leverage these data mining approaches for coming up with fairer matches (dismiss fights that match obvious winning and losing biases). I don’t favor this, but given many fans want to see knockouts, the UFC could even use these approaches to design fights that will likely avoid decisions or submissions.
Anyways, there’s so much more analysis I’ve done (and haven’t done) over this data. Will post more results when cycles permit. Stay tuned.