Broadcast Tracking Data¶
Unlike tracking data from permanently installed camera systems in a stadium, broadcast tracking captures player locations directly from video feeds.
As these broadcast feeds usually show only part of the pitch, this tracking data contains only part of the players information for most frames and may be missing data for some frames entirely due to playbacks or close-up camera views.
For more information about this data please go to https://github.com/SkillCorner/opendata.
A note from Skillcorner:
"if you use the data, we kindly ask that you credit SkillCorner and hope you'll notify us on Twitter so we can follow the great work being done with this data."
Available Matches in the Skillcorner Opendata Repository:
"ID: 4039 - Manchester City vs Liverpool on 2020-07-02"
"ID: 3749 - Dortmund vs Bayern Munchen on 2020-05-26"
"ID: 3518 - Juventus vs Inter on 2020-03-08"
"ID: 3442 - Real Madrid vs FC Barcelona on 2020-03-01"
"ID: 2841 - FC Barcelona vs Real Madrid on 2019-12-18"
"ID: 2440 - Liverpool vs Manchester City on 2019-11-10"
"ID: 2417 - Bayern Munchen vs Dortmund on 2019-11-09"
"ID: 2269 - Paris vs Marseille on 2019-10-27"
"ID: 2068 - Inter vs Juventus on 2019-10-06"
Metadata is available for this data.
Loading Skillcorner data¶
from kloppy import skillcorner
# there is one example match for testing purposes in kloppy that we use here
# for other matches change the filenames to the location of your downloaded skillcorner opendata files
matchdata_file = '../../kloppy/tests/files/skillcorner_match_data.json'
tracking_file = '../../kloppy/tests/files/skillcorner_structured_data.json'
dataset = skillcorner.load(meta_data=matchdata_file,
raw_data=tracking_file,
limit=100)
df = dataset.to_df()
Exploring the data¶
When you want to show the name of a player you are advised to use str(player)
. This will call the magic __str__
method that handles fallbacks for missing data. By default it will return full_name
, and fallback to 1) first_name last_name
2) player_id
.
metadata = dataset.metadata
home_team, away_team = metadata.teams
[f"{player} ({player.jersey_no})" for player in home_team.players]
['Thiago Alcantara (6)', 'Lukas Mai (33)', 'David Alaba (27)', 'Manuel Neuer (1)', 'Mickael Cuisance (11)', 'Corentin Tolisso (24)', 'Thomas Müller (25)', 'Joshua Kimmich (32)', 'Sven Ulreich (26)', 'Robert Lewandowski (9)', 'Kingsley Coman (29)', 'Leon Goretzka (18)', 'Alphonso Davies (19)', 'Javier Martinez (8)', 'Benjamin Pavard (5)', 'Philippe Coutinho (10)', 'Ivan Perisic (14)', 'Serge Gnabry (22)']
print(f"{home_team.ground} - {home_team}")
print(f"{away_team.ground} - {away_team}")
home - FC Bayern Munchen away - Borussia Dortmund
Working with tracking data¶
The actual tracking data is available at dataset.frames
. This list holds all frames. Each frame has a players_coordinates
dictionary that is indexed by Player
entities and has values of the Point
type.
Identities of players are not always specified. In that case only the team affiliation is known and a track_id that is part of the player_id is used to identify the same (unknown) player across multiple frames.
first_frame = dataset.frames[88]
print(f"Number of players in the frame: {len(first_frame.players_coordinates)}")
from pprint import pprint
print("List home team players coordinates")
pprint([
(player.player_id, player_coordinates)
for player, player_coordinates
in first_frame.players_coordinates.items()
if player.team == home_team
])
Number of players in the frame: 18 List home team players coordinates [('home_anon_75', Point(x=0.651772292307619, y=0.058380954035294086)), ('home_22', Point(x=0.5811316475035238, y=0.010450905444117642)), ('home_5', Point(x=0.6586107308438095, y=0.028452003889705924)), ('home_29', Point(x=0.5017706101333714, y=0.6315331512472059)), ('home_18', Point(x=0.5846745990571428, y=0.09758438650000001)), ('home_25', Point(x=0.5817775484115238, y=0.35925220307852945)), ('home_8', Point(x=0.7467174002371428, y=0.13082327161470597)), ('home_27', Point(x=0.7323288673171429, y=0.3728190142901471)), ('home_19', Point(x=0.7071614695780952, y=0.507377931049897)), ('home_32', Point(x=0.635410294564762, y=0.11935376853823532))]
df.head()
period_id | timestamp | frame_id | ball_state | ball_owning_team_id | ball_x | ball_y | ball_z | away_23_x | away_23_y | ... | away_14_d | away_14_s | home_9_x | home_9_y | home_9_d | home_9_s | home_anon_75_x | home_anon_75_y | home_anon_75_d | home_anon_75_s | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 11.2 | 1523 | None | NaN | NaN | NaN | NaN | 0.747489 | 0.098509 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1 | 11.3 | 1524 | None | NaN | 0.791347 | -0.020033 | 2.243712 | 0.745323 | 0.099367 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1 | 11.4 | 1525 | None | NaN | 0.772630 | -0.009469 | 2.534799 | 0.742956 | 0.099743 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1 | 11.5 | 1526 | None | NaN | 0.754625 | 0.001612 | 2.659813 | 0.740386 | 0.099638 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1 | 11.6 | 1527 | None | NaN | 0.737330 | 0.013210 | 2.618755 | 0.737875 | 0.096646 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 92 columns