For the upcoming NHL season I wanted to take a stab at creating a model that would attempt to predict a particular outcome of any given hockey game. This is something that most other hockey analytic sites have, but I wanted to make my own and see how I compare.
Ultimately, I created two models, both of which will be used before (almost) every NHL game this season.
One model predicts the winner of each game while the other predicts the total number of goals scored. For those that are into sports gambling this means it will pick not only a team, but also the over/under.
What makes this fun is that you can measure success in two ways:
1) are the models accurate – does it pick the correct winner & does it predict the correct number of total goals
2) are the models profitable – if the model could bet on what it thinks will happen would it make money
It is important to note that these two measures are not the same. If they were then betting the favorite in every matchup would be a profitable gambling strategy, but the odds are set in a way so this will not happen (generally speaking of course).
I don’t have names for my models yet, but I do intend to keep tuning these and make changes in the future, at which point a name and version would be helpful. For now though let’s keep it simple and call them the matchup and scoring model (please provide suggestions for future model names in the comments).
So what are the models and how do they work?
Both models follow a sub-model structure, that is it is made up of two separate models that feed the final model. These sub-models predict both the home and away team’s outcome on their own. The final model uses these, as well as other variables, to predict the final outcome.
The matchup model’s goal is to predict the winner of a hockey game. It uses a mix of logistic regression as well gradient boosting to determine the winner.
Most of the inputs to the model are based on the expected goal stats from my xGoal model. Some of these are in total terms (such as total xGoals of the previous game), but most are in percent terms (as in xGoals For Percent). These two combinations allow me to have both the team’s offensive and defensive numbers in the model with relative ease.
The scoring model’s goal is to predict the total number of goals scored in a game. It is a gradient boosted model, but unlike the matchup model it does not use any logistic regression because this is not a binary classification problem.
It has similar inputs as the matchup model, but does have a heavier lean towards the offensive numbers as it is focused on the number of goals scored.
Before doing any sort of training or testing of the model I pulled out a random selection of games from the last 8 seasons to be able to fully test how the models might do.
These sample games were never been seen by the models in either the testing or training process so they had no bias towards any of the results.
The matchup model predicted the favorite to win 89.54% of the time, which means it can find some value in the gambling sense of the word – that is getting to bet what you believe to be the favorite at an underdog price.
The matchup model correctly predicted the winner 60.17% of the time in the sample games. Its best season was 2021-22 where it got 66.0% of the winners right. Its worst season was 2015-16 with only 53.52% of the winners right.
In terms of profits the matchup model was more variable. In moneyline bets (simply picking the winner) its best season would have returned 26.72 units, while its worst would have been a -26.98 unit season. A graphic of the season returns is below.
Puckline winning bets were less common but more profitable, which is understandable for hockey. Overall the predicted winner covered 44.07% of the time for a total return of 91.10 units.
Returns from the scoring model were much harder to come by, but I managed. Overall it picked the right side of the over/under line correct 53.85% of the time (this excludes bets the pushed).
This percentage is lower because there are many more points the total line could be at. In hockey betting, generally the total line is set at either 5.5, 6.0, or 6.5. So a predicted score of 6.2 would be considered an OVER if the line is at 5.5 or 6.0, but it would be an UNDER if the line is at 6.5. Because of this the odds makers have a lot more opportunity for movement as they can not only move the odds, but also the line.
The model predicted the OVER 51.02% of the time.
The best season for the scoring model was the 2015-16 season with a return of 19.9 units and 56.23% correctly picked. The worst season was 2021-22 with a return of -38.79 units and 51.13% correctly picked. Ironically these seasons are the exact opposite of the matchup model.
It is pretty clear from above that the 2021-22 season was an outlier. This could be due to the odds, the lines, or even some sample selection bias in the games chosen. Whatever the reason it’s clear that the model just was unable to find a winner during that small 5 day window around the 50th day of the season.
So will the models work?
I do not know, but am excited to find out. I’ve tested thousands of combinations of inputs, parameters, model types, etc. to find what I think will provide the best predictions in both the matchup and scoring cases. As shown above, when testing they both had their successes and failures in different areas.
My goal is to post my model predictions daily on this site’s twitter. All the predictions and returns will be posted out there so those that are interested can follow.