NBA Predictions

This repo contains AI model code and weights which predicts the outcome of NBA games. Its output represents the chance that a given point spread will occur.

The model requires 8 players on the home and away teams, plus their ages, as input. It will then output probabilities for each point spread between -20 and +20 points, from the home team's point of view.

For example, the following text and chart shows the model predicting the home team with a 77% chance to win and a 14% chance of winning by 20 or more points. This kind of chart is indicative of a dominant team playing at home. Most games will have more of a bell curve shape to them.

Installation

I recommend installing Python 3.11.8, as that is what the repo was written / tested in. The code will likely work with most recent versions of Python, though.

Once you have Python installed, run pip install -r requirements.txt. It will take a while to install dependencies if you don't already have PyTorch cached.

Usage

The example.ipynb notebook shows how to use the model to predict the final game of the 2023-24 NBA season - a game between the Dallas Mavericks and Boston Celtics. It will output the chart above.

To change the players and their ages, you must reference the player_tokens.csv and age_tokens.csv files.

For example, if you wanted to subtract Kristaps Porzingis from Boston's team and swap who was home / away, you would take the token representing Porzingis 4416 out of the home_team_tokens list, and replace him with, say, Payton Pritchard 4999. You would then have to look up Pritchard's age (26), find the corresponding age token in age_tokens.csv, which is 11, and replace Porzingis' age token (which is the second to last token).

To swap home and away, you could replace the variables containing all of the player and age tokens, or just set the swap_home_away variable to True. The results are as follows:

As you can see, Dallas' win probability improved from 23% to 35%, and their chance of being blown out by 20+ points decreased from 14% to 10%. Clearly, the model thinks Porzingis is important to the Celtics' chances, but still considers Boston to be the superior team without him.

Training Process

I downloaded data from stats.nba.com using the https://github.com/swar/nba_api package to get information on minutes played, game outcomes, and a few other dimensional elements to make everything fit together. Then, I ran a custom PyTorch training loop to train the model(s) on their chosen loss objective (spread, money line, or spread probability).