top of page

NBA Data Science: Analyzing Player Stats Using Python and the NBA API

Writer's picture: Ben BallardBen Ballard

Updated: 1 day ago

This post is a walk-through on how I created a process using python to pull NBA data through the NBA API and analyze player career stats like field goal percentages, points, rebounds and assists. In particular, I’ll focus on how I pulled from the NBA API, using pandas.


Basketball and Data Science using NBA API!

NBA API Basketball Analytics
NBA API Basketball Analytics

Gathering the Basketball Data

The cornerstone of this process is pulling the player career data. To easily navigate this, I recommend using my NBA API Buddy. I designed this CustomGPT with an in-depth understanding of the NBA API, encompassing essential elements like understanding the API endpoints as well as knowing Player and Team IDs. If your query deviates from the examples provided, don’t hesitate to test it out! The NBA API Buddy is equipped to handle a all API inquiries, making your data gathering both efficient and user-friendly.


The first step is identifying what player you’d like to analyze. Then use the NBA API Buddy to identify the Player ID. Replace the current ID ‘202681’ with whatever the new playerID is.

The code utilizes the nba_api.stats.endpoints module, specifically tapping into playercareerstats. This function will then fetch the comprehensive career statistics of your chosen player.


from nba_api.stats.endpoints
import playercareerstats

# Fetching career statistics for Player of Choice using his player ID
player_career = playercareerstats.PlayerCareerStats(player_id='202681')
player_career_df = player_career.get_data_frames()[0]

# Extracting the seasons of player of choice
seasons_played = player_career_df['SEASON_ID'].unique()
print(seasons_played.tolist())

The above code will print out the number of seasons the NBA player you chose played in a list like this: [‘2019–20’, ‘2020–21’, ‘2021–22’, ‘2022–23’, ‘2023–24’]. Copy the years output and replace the list of seasons below.


import matplotlib.pyplot as plt
import pandas as pd
from nba_api.stats.endpoints import playergamelog

# Initialize an empty DataFrame to store all game logs
all_seasons_logs_df = pd.DataFrame()

# List of seasons to loop through (update this list as needed)
seasons = ['2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

# Fetch game logs for each season and add a 'SEASON' column
for season in seasons:
    player_logs = playergamelog.PlayerGameLog(player_id='202681', season=season)
    season_logs_df = player_logs.get_data_frames()[0]
    season_logs_df['SEASON'] = season  
    all_seasons_logs_df = pd.concat([all_seasons_logs_df, season_logs_df], ignore_index=True)

After running the above, all_season_logs_df dataframe will include all game data for the player of interest. The table will includes many more stats, but the table below is a small example of what is contained in the data.

Sample of the Season Logs for Player from NBA API
Sample of the Season Logs for Player from NBA API

Data Processing

After pulling the data, the next steps involve processing the data for analysis and visualization. While the data comes pretty clean after the API pull, there was some processing I did to. First, ensure the GAME_DATE column is in a datetime format, and to do that you can use the pandas to_datetime method.

# Convert Game_Date to a date
timeall_seasons_logs_df['GAME_DATE'] = pd.to_datetime(all_seasons_logs_df['GAME_DATE'])

# Create Month_Year to faciliate Month/Date Analysisall_seasons_logs_df['MONTH_YEAR'] = all_seasons_logs_df['GAME_DATE'].dt.to_period('M')

Creating summary tables is important for understanding the data and building narratives for story-telling. I used the following code to create a yearly aggregation of the game level stats to tell the story of a players’ career.

#Aggregate game level data to yearly
yearly_stats = all_seasons_logs_df.groupby('YEAR').agg({    'FGM': 'sum',     'FGA': 'sum',     'FG3M': 'sum',     'FG3A': 'sum',     'FTM': 'sum',     'FTA': 'sum', }).reset_index()

# This takes the field goals made and attempted to calculate field goal percent
yearly_stats['FG_PCT'] = yearly_stats['FGM'] / yearly_stats['FGA']
yearly_stats['FG3_PCT'] = yearly_stats['FG3M'] / yearly_stats['FG3A']
yearly_stats['FT_PCT'] = yearly_stats['FTM'] / yearly_stats['FTA']
yearly_stats['GAMES'] = lebron_data.groupby('YEAR')['Game_ID'].count().values

Data Visualization

The final part of this script is visualizing the data. There are many options to visualize the data from line charts, histograms, bar graphics. I used seaborn and matplotlib to create visualizations.

Example 1 of Visualizations. Line Graphs showing Points, Assists, and Rebounds, NBA API
Example 1 of Visualizations. Line Graphs showing Points, Assists, and Rebounds

Example 2 of Visualizations. Histograms showing Field Goal Percentages by Category.  NBA API
Example 2 of Visualizations. Histograms showing Field Goal Percentages by Category.

Finally, I’ll present a visualization that I enjoyed creating. This graph tracks a specific stat — in this case, the number of three-pointers made per season — and compares it across different seasons. For a specific example, we’ll look at Dirk Nowitzki’s career. The visualization plots the cumulative count of three-pointers Dirk made each season, with the game number on the X-axis. To highlight a specific season, like 2009, I’ve used a red line. Interestingly, while 2009 was a career-high in three-point percentage for Dirk, it featured one of his lowest totals for made threes. This type of visualization offers a nuanced view of a player’s performance over time.


Cumulative Three Pointers Made, NBA API, Data Science
Cumulative Three Pointers Made

In order to get that graphic to work, we must first create the cumulative sum of threes (FG3M) for each season. The code below does that for us.

# Calculate Cummulative Sum
all_seasons_logs_df['FG3M_CUMSUM'] = all_seasons_logs_df.groupby('YEAR')['FG3M'].cumsum()
all_seasons_logs_df['FGM_CUMSUM'] = all_seasons_logs_df.groupby('YEAR')['FGM'].cumsum()

Finally, this code snippet is designed to plot a distinct line for each season, representing the cumulative field goals made by the player. What makes this visualization particularly engaging is the use of color coding. I’ve chosen red for the 2023 season to make that pop for analysis, making it stand out against the others which will be grey. Additionally, to aid in readability and reference, only the year 2023 is labeled on the graph.

# Plotting a line for each season
for year in all_seasons_logs_df['YEAR'].unique():
    season_data = all_seasons_logs_df[all_seasons_logs_df['YEAR'] == year]

    # Plot a particular year of interest in red
    # Using the team color for each season, and bright red for the year 2023
    color = 'red' if year == 2023 else 'silver'

    # Label only for the year 2023
    label = f'{year}' if year == 2023 else None

    plt.plot(season_data['Game_Number'], 
             season_data['FGM_CUMSUM'], 
             label=label, 
             color=color)

plt.title('Cumulative Field Goals Made by Player Over Each Season')
plt.xlabel('Game Number')
plt.ylabel('Cumulative FG Made')
plt.legend()
plt.show()


Wrap Up


I plan to revisit this and work on a few areas. Specifically:

  • Dynamic Season List: Instead of having the first block of code to output the season a player played and then hardcoding the seasons list, I would like to dynamically generate it based on the player’s career data. This would make the script more flexible and reduce the need for manual updates.

  • Modularization: I do some parts like manually updating graph titles and other running cells in a specific order in my notebook. I’d like to break down the notebook into functions for specific tasks (i.e. fetching data, processing data, and plotting). This will make the code more organized and reusable.

  • Of course, Documentation: Adding more comments and explanations throughout the script will improve my readability and make it easier for me and others use your code. Once I clean it up, I plan to post to my github.


If you’re interested, here’s an example of a story where I utilized the above workflow to write about Luka Doncic’s NBA MVP case.


Hope you found value in this. Please let me know. I’m always interested in other points of view and learning from others on how to improve my analysis process.

1,606 views0 comments

Recent Posts

See All

Comments


bottom of page