This week’s post will be a little different than others, mainly because its not looking at statistics per se, but rather some of the work I’ve been doing with the statistics I’ve traditionally focused on.
When I first started this site I knew I wanted to eventually develop a statistical model that would attempt to predict something in the world of sports. Whether that was which football team was going to win a game, total number of strikeouts by a pitcher, or number of three pointers by an NBA player – I wasn’t sure.
The other week I decided I wanted to try and predict the total number of goals a hockey team would score on a given night (the gambling world calls this the team total). This post looks at the early results of the two models.
A goal of mine is to always speak in layman’s terms on this site. As I’ve said before, you shouldn’t have to have a college degree in statistics or economics to understand statistical models. Below I’ll explain as simply as I can what a model is.
A “model” is just a fancy way of saying a mathematical equation. By running a bunch of data through statistical software it will create a model that provides what you want to based on the data provided. In this case, the models’ output is goals for (GF) which is the total number of goals for a single team.
The technical term for the models are “multivariate linear regression models”. It’s super boring to go into this more so I’ll leave it at that (the wikipedia page for it can be found here).
Some of the variables in the models include: shots for a team, opponent save percentage, high danger scoring chances, and a few others.
The below graphic is the initial results of the models. You’ll see the date of the hockey game, the team, the outcome of the pick (did it win or lose), and what the estimated score of the team was.
A win or loss is based on if the model successfully picked over, or under, the gambling line of Vegas. I’ll go into this more later. Note that Model 2 did not play on 11/15.
On 11/15 Model 1 went 4-6 on its picks. The next day however Model 1 and Model 2 went 23-7. They both correctly picked 77% of bets right. What’s interesting is the differences between Model 1 and 2. Model 1 incorrectly picked the Edmonton Oilers’ total, but Model 2 did. The reverse is true with the Vancouver Canucks.
The next two graphics show the models’ bets, the betting line, and the outcome of the bet.
A team total bet works like this:
Vegas creates the “line” of a teams total goals, this is almost always either 2.5 or 3.5. Betting the over means you think the team will score more than the given line, betting the under means you think the team will score less.
In the below graphics the betting line is represented by an asterisk. The first set of bars are the models’ bets, either over the betting line which is in blue, or under the betting line which is in orange. The second set of columns are the actual number of goals scored by that team; wins are in green, losses are in red.
The models “won” when it picked the under (over) and the team actually scored less (more) points than the line.
The final graphic shows some of how the two models differ. While I wont go into the specifics, the two models have a different number of variables in them. Below is a representation of how this can alter an output.
I’ve plotted the projected output of both the models and just one of the variables, shots for. Model 1 is in purple, model 2 is in green. The linear trend line of the two models are included too.
While only slightly different, you can see the two models do predict a different number of goals on average. When shots are below 29, model 1 predicts a higher number of goals scored, but once a team goes over 29 shots model 2 predicts a higher number of goals scored.
Again, this is just one of the variables used in both models. Really, all of the variables used between the two differ slightly like this. It’s this difference that causes differences in the picks.
My goal is to continue to create/edit these models in the hopes of predicting hockey scores with a reasonable level of accuracy. Once I have more data points and they have run for a longer period of time I’ll post another article summarizing my results.