Computing 'per 90' metrics¶
This guide explains how to process a dataset using Kloppy and Polars to analyze player passing per 90 minutes (p90). It covers extracting minutes played, filtering passes, and computing successful passes p90 and total passes p90 metrics.
Setup¶
Start by loading some event data using the Kloppy module. For the sake of this demonstration, we will use Statsbomb Open Event Data.
In [1]:
Copied!
from kloppy import statsbomb
import polars as pl
dataset = statsbomb.load_open_data(
match_id=15946,
# Optional arguments
coordinates="statsbomb",
)
from kloppy import statsbomb
import polars as pl
dataset = statsbomb.load_open_data(
match_id=15946,
# Optional arguments
coordinates="statsbomb",
)
/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/kloppy/kloppy/_providers/statsbomb.py:83: UserWarning: You are about to use StatsBomb public data. By using this data, you are agreeing to the user agreement. The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf warnings.warn(
Extract Minutes Played¶
In [2]:
Copied!
mins_played = dataset.aggregate("minutes_played")
for item in mins_played:
print(f"{item.player} - {item.duration.total_seconds() / 60:.1f} minutes played")
mins_played = dataset.aggregate("minutes_played")
for item in mins_played:
print(f"{item.player} - {item.duration.total_seconds() / 60:.1f} minutes played")
Philippe Coutinho Correia - 47.5 minutes played Sergio Busquets i Burgos - 84.3 minutes played Jordi Alba Ramos - 92.6 minutes played Gerard Piqué Bernabéu - 92.6 minutes played Luis Alberto Suárez Díaz - 92.6 minutes played Ivan Rakitić - 92.6 minutes played Ousmane Dembélé - 76.3 minutes played Samuel Yves Umtiti - 92.6 minutes played Lionel Andrés Messi Cuccittini - 92.6 minutes played Nélson Cabral Semedo - 45.1 minutes played Sergi Roberto Carnicer - 92.6 minutes played Arturo Erasmo Vidal Pardo - 8.3 minutes played Arthur Henrique Ramos de Oliveira Melo - 16.3 minutes played Marc-André ter Stegen - 92.6 minutes played Borja González Tomás - 24.6 minutes played Jonathan Rodríguez Menéndez - 68.0 minutes played Rubén Duarte Sánchez - 92.6 minutes played Rubén Sobrino Pozuelo - 70.6 minutes played Víctor Laguardia Cisneros - 92.6 minutes played Ibai Gómez Pérez - 92.6 minutes played Martín Aguirregabiria Padilla - 92.6 minutes played Jorge Franco Alviz - 22.0 minutes played Mubarak Wakaso - 92.6 minutes played Fernando Pacheco Flores - 92.6 minutes played Manuel Alejandro García Sánchez - 92.6 minutes played Daniel Alejandro Torres Rojas - 67.9 minutes played Guillermo Alfonso Maripán Loaysa - 92.6 minutes played Adrián Marín Gómez - 24.7 minutes played
Compute Passes Per 90 Minutes¶
First, we filter the dataset to include pass events only and convert to Polars DataFrame. Then, we calculate minutes played by each player.
In [3]:
Copied!
# Only keep Passes and convert to Polars DataFrame
passes_polar = dataset.filter("pass").to_df(
"player_id",
lambda event: {
"player_name": str(event.player),
"success": event.result.is_success if event.result is not None else None,
},
engine="polars",
)
# Calculate minutes played
mins_played_pl = pl.DataFrame(
[
{
"player_id": item.player.player_id,
"minutes_played": item.duration.total_seconds() / 60,
}
for item in mins_played
]
)
# Only keep Passes and convert to Polars DataFrame
passes_polar = dataset.filter("pass").to_df(
"player_id",
lambda event: {
"player_name": str(event.player),
"success": event.result.is_success if event.result is not None else None,
},
engine="polars",
)
# Calculate minutes played
mins_played_pl = pl.DataFrame(
[
{
"player_id": item.player.player_id,
"minutes_played": item.duration.total_seconds() / 60,
}
for item in mins_played
]
)
Now, to calculate the p90 metrics:
- Group by
player_idandplayer_nameto aggregate pass statistics. - Compute successful passes and total passes.
- Join with
mins_played_plto include minutes played. - Calculate Per 90 Metrics:
success_p90: Successful passes per 90 minutes.total_p90: Total passes per 90 minutes.
In [4]:
Copied!
# Calculate p90 metrics
passes_p90 = (
passes_polar.group_by("player_id", "player_name")
.agg(successful_passes=pl.sum("success"), total_passes=pl.len())
.join(mins_played_pl, on="player_id")
.with_columns(
success_p90=pl.col("successful_passes") / pl.col("minutes_played") * 90,
total_p90=pl.col("total_passes") / pl.col("minutes_played") * 90,
)
)
passes_p90
# Calculate p90 metrics
passes_p90 = (
passes_polar.group_by("player_id", "player_name")
.agg(successful_passes=pl.sum("success"), total_passes=pl.len())
.join(mins_played_pl, on="player_id")
.with_columns(
success_p90=pl.col("successful_passes") / pl.col("minutes_played") * 90,
total_p90=pl.col("total_passes") / pl.col("minutes_played") * 90,
)
)
passes_p90
Out[4]:
shape: (28, 7)
| player_id | player_name | successful_passes | total_passes | minutes_played | success_p90 | total_p90 |
|---|---|---|---|---|---|---|
| str | str | u32 | u32 | f64 | f64 | f64 |
| "3501" | "Philippe Coutinho Correia" | 46 | 52 | 47.517683 | 87.12546 | 98.48965 |
| "5203" | "Sergio Busquets i Burgos" | 77 | 83 | 84.319283 | 82.187606 | 88.591835 |
| "5211" | "Jordi Alba Ramos" | 117 | 128 | 92.618717 | 113.691923 | 124.380907 |
| "5213" | "Gerard Piqué Bernabéu" | 76 | 81 | 92.618717 | 73.851164 | 78.709793 |
| "5246" | "Luis Alberto Suárez Díaz" | 28 | 39 | 92.618717 | 27.208323 | 37.897308 |
| … | … | … | … | … | … | … |
| "6629" | "Fernando Pacheco Flores" | 12 | 26 | 92.618717 | 11.66071 | 25.264872 |
| "6632" | "Manuel Alejandro García Sánche… | 16 | 21 | 92.618717 | 15.547613 | 20.406243 |
| "6839" | "Daniel Alejandro Torres Rojas" | 12 | 16 | 67.909867 | 15.903433 | 21.204577 |
| "6855" | "Guillermo Alfonso Maripán Loay… | 11 | 16 | 92.618717 | 10.688984 | 15.547613 |
| "6935" | "Adrián Marín Gómez" | 5 | 7 | 24.70885 | 18.212098 | 25.496937 |