Event data¶
One of the main benefits of working with kloppy is that it loads metadata with the event data. This metadata includes teams (name, ground and provider id) and players (name, jersey number, optional position and provider id). Using this metadata, it becomes very easy to create an analysis that is usable by humans, because it includes names instead of only numbers.
This section shows how metadata is organized and some use-cases.
Loading statsbomb data¶
The datasets module of kloppy makes it trivial to load statsbomb data. Keep in mind that by using the data you accept the license of the open-data project.
from kloppy import datasets
dataset = datasets.load("statsbomb", options={"event_types": ["pass", "shot"]})
/Users/koen/PycharmProjects/kloppy/.venv/lib/python3.7/site-packages/kloppy-1.0.0-py3.7.egg/kloppy/infra/datasets/event/statsbomb.py:12: UserWarning: You are about to use StatsBomb public data. By using this data, you are agreeing to the user agreement. The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf "\n\nYou are about to use StatsBomb public data."
Exploring metadata¶
kloppy always loads the metadata for you and makes it available at the metadata
property.
metadata = dataset.metadata
home_team, away_team = metadata.teams
After loading the data, the metadata can be used to iterate over teams and players. By default metadata.teams
contain [HomeTeam, AwayTeam]
. Team
and Player
entities have the __str__
magic method implemented to help you cast it to a string. When you want to
print(f"{home_team.ground} - {home_team}")
print(f"{away_team.ground} - {away_team}")
home - Barcelona away - Deportivo Alavés
[f"{player} ({player.jersey_no})" for player in home_team.players]
['Malcom Filipe Silva de Oliveira (14)', 'Philippe Coutinho Correia (7)', 'Sergio Busquets i Burgos (5)', 'Jordi Alba Ramos (18)', 'Gerard Piqué Bernabéu (3)', 'Luis Alberto Suárez Díaz (9)', 'Ivan Rakitić (4)', 'Ousmane Dembélé (11)', 'Samuel Yves Umtiti (23)', 'Lionel Andrés Messi Cuccittini (10)', 'Nélson Cabral Semedo (2)', 'Sergi Roberto Carnicer (20)', 'Clément Lenglet (15)', 'Rafael Alcântara do Nascimento (12)', 'Arturo Erasmo Vidal Pardo (22)', 'Jasper Cillessen (13)', 'Arthur Henrique Ramos de Oliveira Melo (8)', 'Marc-André ter Stegen (1)']
# get provider id for team
f"statsbomb team id: {home_team.team_id} - {away_team.team_id}"
'statsbomb team id: 217 - 206'
# same for the players
[f"{player} id={player.player_id}" for player in metadata.teams[0].players]
['Malcom Filipe Silva de Oliveira id=3109', 'Philippe Coutinho Correia id=3501', 'Sergio Busquets i Burgos id=5203', 'Jordi Alba Ramos id=5211', 'Gerard Piqué Bernabéu id=5213', 'Luis Alberto Suárez Díaz id=5246', 'Ivan Rakitić id=5470', 'Ousmane Dembélé id=5477', 'Samuel Yves Umtiti id=5492', 'Lionel Andrés Messi Cuccittini id=5503', 'Nélson Cabral Semedo id=6374', 'Sergi Roberto Carnicer id=6379', 'Clément Lenglet id=6826', 'Rafael Alcântara do Nascimento id=6998', 'Arturo Erasmo Vidal Pardo id=8206', 'Jasper Cillessen id=8652', 'Arthur Henrique Ramos de Oliveira Melo id=11392', 'Marc-André ter Stegen id=20055']
# get player from first event
player = dataset.events[0].player
print(player)
print(player.team)
print(f"Teams are comparable? {player.team == away_team}")
Jonathan Rodríguez Menéndez Deportivo Alavés Teams are comparable? True
The Team
and Player
entities also contain the magic methods to use those keys in dictionaries or use them in sets. This makes it easy to do some calculations, and show the results without mapping the player_id to a name.
from collections import defaultdict
passes_per_player = defaultdict(list)
for event in dataset.events:
if event.event_name == "pass":
passes_per_player[event.player].append(event)
for player, passes in passes_per_player.items():
print(f"{player} has {len(passes)} passes")
Jonathan Rodríguez Menéndez has 14 passes Guillermo Alfonso Maripán Loaysa has 18 passes Sergio Busquets i Burgos has 79 passes Ivan Rakitić has 138 passes Ousmane Dembélé has 65 passes Jordi Alba Ramos has 121 passes Víctor Laguardia Cisneros has 11 passes Marc-André ter Stegen has 23 passes Gerard Piqué Bernabéu has 79 passes Nélson Cabral Semedo has 31 passes Sergi Roberto Carnicer has 85 passes Samuel Yves Umtiti has 63 passes Lionel Andrés Messi Cuccittini has 92 passes Rubén Duarte Sánchez has 25 passes Ibai Gómez Pérez has 35 passes Mubarak Wakaso has 23 passes Manuel Alejandro García Sánchez has 23 passes Rubén Sobrino Pozuelo has 17 passes Luis Alberto Suárez Díaz has 38 passes Fernando Pacheco Flores has 16 passes Martín Aguirregabiria Padilla has 20 passes Daniel Alejandro Torres Rojas has 16 passes Philippe Coutinho Correia has 51 passes Jorge Franco Alviz has 11 passes Adrián Marín Gómez has 6 passes Arthur Henrique Ramos de Oliveira Melo has 18 passes Borja González Tomás has 7 passes Arturo Erasmo Vidal Pardo has 7 passes
Now let's filter on home_team.
for player, passes in passes_per_player.items():
if player.team == home_team:
print(f"{player} has {len(passes)} passes")
Sergio Busquets i Burgos has 79 passes Ivan Rakitić has 138 passes Ousmane Dembélé has 65 passes Jordi Alba Ramos has 121 passes Marc-André ter Stegen has 23 passes Gerard Piqué Bernabéu has 79 passes Nélson Cabral Semedo has 31 passes Sergi Roberto Carnicer has 85 passes Samuel Yves Umtiti has 63 passes Lionel Andrés Messi Cuccittini has 92 passes Luis Alberto Suárez Díaz has 38 passes Philippe Coutinho Correia has 51 passes Arthur Henrique Ramos de Oliveira Melo has 18 passes Arturo Erasmo Vidal Pardo has 7 passes
Use metadata when transforming to pandas dataframe¶
The metadata can also be used when transforming a dataset to a pandas dataframe. The additional_columns
argument should be passed to to_pandas
.
from kloppy import to_pandas
dataframe = to_pandas(dataset, additional_columns={
'player_name': lambda event: str(event.player),
'team_name': lambda event: str(event.player.team)
})
dataframe[[
'event_id', 'event_type', 'result', 'timestamp', 'player_id',
'player_name', 'team_name'
]].head()
event_id | event_type | result | timestamp | player_id | player_name | team_name | |
---|---|---|---|---|---|---|---|
0 | 34208ade-2af4-45c3-970e-655937cad938 | PASS | COMPLETE | 0.098 | 6581 | Jonathan Rodríguez Menéndez | Deportivo Alavés |
1 | d1cccb73-c7ef-4b02-8267-ebd7f149904b | PASS | INCOMPLETE | 3.497 | 6855 | Guillermo Alfonso Maripán Loaysa | Deportivo Alavés |
2 | f1cc47d6-4b19-45a6-beb9-33d67fc83f4b | PASS | COMPLETE | 6.785 | 5203 | Sergio Busquets i Burgos | Barcelona |
3 | f774571f-4b65-43a0-9bfc-6384948d1b82 | PASS | COMPLETE | 8.431 | 5470 | Ivan Rakitić | Barcelona |
4 | 46f0e871-3e72-4817-9a53-af27583ba6c1 | PASS | COMPLETE | 10.433 | 5477 | Ousmane Dembélé | Barcelona |