For something a bit different, this blog post either marks the start of an incredible get-rich quick scheme, or the start of my descent into a hopeless spiral of gambling addiction.

Recently I’ve been getting interested in prediction models, and particularly in 538’s World Cup predictions. Seeing as I couldn’t find anyone trying to algorithmically predict rugby games in Europe, I thought I’d give it a crack myself. The result is a system called Baldo’s Crystal Egg (TM) which I’m going to be tinkering with and using throughout the upcoming rugby season to predict the result of every match in the Pro 12 and English Premiership (and maybe the I-can’t-believe-it’s-not-the-Heineken Cup too).

There are some numbers to follow – if you don’t want to think about stats, don’t worry, I’ll be talking rugby soon.

I’ll write a post with the full gory details of how the system works in R and Python soon, but for the stats-shy, it’s pretty simple in principle (if not in practice!). I started by collecting data from the ESPN Scrum website, which reports on every match and gives stats at an individual player level for most. Using this data set, I was able to calculate a measure of a team’s performance in any given game, based on both the outcome of the game and the strength of the opposition (and other factors such as home advantage). I was also able to roughly score a player’s individual contribution, based on how well the team did when they were on the pitch and their own actions – tries, assists, tackles, turnovers, etc. The result is an approximate score for each team and player for both offence and defence.

These scores are weighted for both time on the pitch and how long ago the game was, with a limit of four years – so Munster’s glory days of 2006-08 won’t unfairly inform their current scores.

Using some fancy statistical analysis (multinomial logistic regression if you’re interested), I can build a model that estimates the probability of various different results based on how two teams with similar scores for offence and defence have fared when playing each other in the past.

For example, at the moment Munster have an offensive score of 22.5 and a defensive score of 17 (roughly signifying how many points you would expect them to score and concede against an “average” team at a neutral ground), while the equivalent scores for Leinster are 25.5 and 16.5. The model estimates that if the two teams met at Thomond Park, Munster would have a 50% chance of winning the game. On the other hand, the same game in the RDS would give them just a 24% chance of victory. Unfortunately, this seems about right!

All in all, the model currently accurately predicts the outcome in about 75% of past games. The model is most effective if you have the full lineups for each team when predicting, but it still fares reasonably well based solely on team ratings (>70%). It can also predict the outcome of the game in terms of bonus point wins or losses, but there is naturally a bit of a loss in accuracy. Of course, it’s easy to predict the past – the real challenge is to predict the future!

Which is why I’m going to place bets every week of the rugby season this year based on suggestions from this model. The bets won’t always be on the favourites either, either by the bookies or by the models. The aim will be to find where there’s best value in bets – if my model predicts a team has a 30% chance of winning, but they’re at five to one odds, then they’re still a good bet, even if I expect them on the whole to lose. Hopefully over the course of a year this averages out…

Every week I’ll make my bets (~£5 depending on how good value’s to be had) and post my predictions for each game (Pro 12 and Premiership only at the moment, but hopefully I’ll incorporate international by the time that rolls around too). At the end of the year I’ll either be the next Nate Silver or broke, but let’s see!

First round of predictions due out in two weeks for the first round of games. In the meantime I’ll be tinkering with the model and trying to stop it saying that Riki Flutey is a better centre than Brian O’Driscoll.

frank frenett

As an ex player and born in Munster and a gambler and a maths lover you have hooked!

Where are you posting your predictions?

Baldo

Cheers Frank. I’ll be posting them up here, probably of a Friday. I don’t expect to be making much money on it any time soon mind!

Lorcan

I’m very interested in the gory details, looking forward to seeing that!

Carli

Jon, me, a power user?! As I say I will be happily keeping an eye on News 2.0. Already it’s been a great way of seeing what the Australian blogosphere is talking about!Skribe I’d forgotten about that site, pegtrblogs.orh! Will bookmark it too!Thanks Sarah, I’m going to have to go and check those sites out now

http://www.publisheralabaster.biz/

Until I found this I thought I’d have to spend the day inside.

Baldo

Will try and get working on it – in the meantime, it is basically a shoddy rip off of this, so worth checking out:

http://espn.go.com/soccer/worldcup/news/_/id/4447078/GuideToSPI

John Rowland

Have signed up for future posts

Ronan

Sounds interesting

Predicting the leagues : Baldo

[…] was glad to see some interest in the Crystal Egg model when I made my first post during the week. There seemed to particularly be questions around how the model worked, so I have […]

Stephen

As a postgrad student blissfully nearing the end of a stat-heavy dissertation, it’s nice to see that stats can be put to good use!

Baldo

Glad to hear it – I’m sure almost any other use of stats including whatever you were doing is of more value to the world than this!

Round 4 Review - Baldo

[…] and those who need a reminder, an introductory blog post about the Crystal Egg model can be found here, along with full details of how the system works […]

J

Added to my pulse. Great idea!!

Rod

Given the ‘interesting’ autumn international series so far, any thoughts about going international before the six nations starts?