How to Analyze NBA Data Using Python and the NBA API
If you’re a fan of basketball, you probably love watching NBA games and following your favorite players. But did you know that you can also analyze NBA data using Python and a powerful API? In this blog post, I’ll show you how to use the NBA_API to access NBA data, perform statistical analysis, and create visualizations.
The NBA has become a staple in American culture and as technology has progressed, accessing NBA data has become increasingly easier. There are several NBA APIs available on the web, but for the purpose of this analysis, we will focus on the nba_api package. Thenba_api is an API Client for www.nba.com. This package stated goal is to make the APIs of NBA.com easily accessible and provide accessible documentation. This package is open-source.
Using the nba_api package, it is simple to write a script or program that can request and parse NBA data. The nba_api provides many methods such as get_players(), playercareerstats(), etc. These methods can be used to pull data from the NBA API and transform it into a pandas dataframe.
Let's take a look at an example. The following script pulls data for the top 500 scorers by PTS column, groups them by player_name and Player_ID columns, and calculates their averages for MIN, FGM, FGA, FTM, FTA, PTS, FG3M, FG3A, and GP columns.
from nba_api.stats.endpoints import leagueleaders import pandas as pd # Pull data for the top 500 scorers by PTS column top_500 = leagueleaders.LeagueLeaders( per_mode_simple='PerGame', season='2020-21', season_type_all_star='Regular Season', stat_category_abbreviation='PTS' ).get_data_frames()[:500] # Group players by name and player ID and calculate average stats top_500_avg = top_500.groupby(['player_name', 'player_id']).mean()[[ 'min', 'fgm', 'fga', 'ftm', 'fta', 'pts', 'fg3m', 'fg3a', 'gp' ]]
The leagueleaders.LeagueLeaders() method takes in parameters such as the season and stat category abbreviation to pull data for the top 500 players by points scored per game for the specified season. The get_data_frames() method returns a list of data frames, with the first item in the list containing the data we want. We then filter out the top 500 players and group them by name and player ID using the groupby() method. Finally, we calculate the average values for the desired columns.Once we have our data in a pandas dataframe, we can perform additional analysis on it. For example, we can plot the total points scored versus the number of three-pointers made using plotly.
import plotly.express as px fig = px.scatter(top_500_avg.reset_index(), x='pts', y='fg3m', hover_name='player_name') fig.show()
This will display an interactive scatter plot of the average number of three-pointers made versus the total points scored for each player in the top 500.
The scatter plot shows a scatter plot of Total PTS, at this point in the season 2/19/2023, and total threes made. To get a better sense of the top 5 players in scoring, I’ve included a table below, as well as the python line that will get you the below.
top_5 = agg_players.sort_values(by='PTS', ascending=False).head(5)
In conclusion, the nba_api package is a great tool for accessing and analyzing NBA data. With its many methods, it’s a great way to grab fun data to conduct your analysis. With powerful tools such as pandas and plotly, it is easy to perform gain insights and product visualizations. Whether you are a casual NBA fan or a seasoned analyst, the nba_api package is worth checking out.