Skip to content

Getting started

This chapter is here to help you get started quickly with kloppy. It gives a 10-minute introduction on how kloppy can be used to load, filter, transform, and export soccer match data. If you're already familiar with kloppy, feel free to skip ahead to the next chapters for a more in-depth discussion of all functionality.

Installing kloppy

The recommended and easiest way to install kloppy is through pip.

pip install kloppy

The installation guide provides an overview of alternative options.

Loading data

In soccer analytics, data typically comes in two main formats: event data and tracking data. Each format has unique advantages and they are often used together.

Yet, within these main formats, there are notable differences between data providers in terms of the structure, naming conventions, and the depth of information available. This implies that significant patchwork and data plumbing is needed to repeat the same analysis for different data sources. Kloppy aims to eliminate these challenges by providing a set of data parsers that can load data from all common data providers into a common standardized format.

Kloppy standardizes soccer data

We'll illustrate how to load a dataset in this standardized format using a sample of publicly available data provided by Sportec. For examples on loading data from other data providers, see the Loading Data section of the user guide.

Event data

The public Sportec dataset contains both event data and tracking data. We'll first load the event data.

1
2
3
from kloppy import sportec

event_dataset = sportec.load_open_event_data(match_id="J03WN1")

The resulting EventDataset contains a list of Event entities that represent actions such as passes, tackles, and shots. Each event is mapped to a specific subclass—e.g., shots become a ShotEvent, passes become a PassEvent, and so on. These events are annotated with contextual information, including the players involved, the outcome of the action, and the location on the pitch where it occurred.

A standardized event data model

Event data providers like Stats Perform, StatsBomb, and Wyscout have each developed their own event data catalogs, with unique definitions and categorizations for various event types. To address this lack of standardization, kloppy introduces its own event data model, which acts as a common denominator for event types and attributes used across data providers. This model facilitates the integration of data from diverse event data catalogs. If you need event types or data attributes that are not included in kloppy's datamodel, you can easily extend the data model.

You can retrieve specific events by their index or by their unique ID (as given by the data provider) using the .get_event_by_id() method. Below, we illustrate this by retrieving the opening goal in the match.

1
2
3
>>> goal_event = event_dataset.get_event_by_id("18226900000272")
>>> print(goal_event)
<ShotEvent event_id='18226900000272' time='P1T18:04' team='VfL Bochum 1848' player='P. Förster' result='GOAL'>

Often, you will not know the exact index or ID of an event. In that case, you can use the .find() and .find_all() methods for finding the right events. You can pass a string or a function. In case of a string, it must be either '<event_type>', '<event_type>.<result>' or '.<result>'. Some examples: 'shot.goal', 'pass' or '.complete'. Let's look at how this works by finding all shots in the dataset.

shot_events = event_dataset.find_all("shot")
goal_events = event_dataset.find_all("shot.goal")

On event-level there are also some useful methods for navigating: the .prev() and .next() methods allow you to quickly find previous or next events. Those two methods also accept the same filter argument as the .find() and .find_all() methods, which can be useful to find a certain type of event instead of just the one before/after. For example, we can use it to find the assist for a goal.

1
2
3
>>> assist_event = goal_event.prev("pass.complete")
>>> print(assist_event)
<PassEvent event_id='18226900000271' time='P1T18:00' team='VfL Bochum 1848' player='T. Asano' result='COMPLETE'>

Using the wonderful mplsoccer package we can now plot the goal and its assist.

2025-05-23T16:26:47.239446 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/

import matplotlib.pyplot as plt
from mplsoccer import VerticalPitch
from kloppy.domain import PositionType

# Setup the pitch
pitch = VerticalPitch(
    half=True, goal_type='box', pad_bottom=-.2,
    line_color="#cfcfcf",
    line_zorder=1,
    pitch_type="metricasports",
    pitch_length=event_dataset.metadata.pitch_dimensions.pitch_length,
    pitch_width=event_dataset.metadata.pitch_dimensions.pitch_width,
)

# We will use mplsoccer's grid function to plot a pitch with a title axis.
fig, axs = pitch.grid(
    figheight=4, endnote_height=0,  # no endnote
    title_height=0.1, title_space=0.02,
    # Turn off the endnote/title axis. I usually do this after
    # I am happy with the chart layout and text placement
    axis=False,
    grid_height=0.83
)

# Plot the goal angle
pitch.goal_angle(
    1-assist_event.receiver_coordinates.x, 1-assist_event.receiver_coordinates.y,
    alpha=0.2, zorder=1.1, color='#cb5a4c', goal='right', ax=axs['pitch']
)
# Plot the assist
pitch.lines(
    1-assist_event.coordinates.x, 1-assist_event.coordinates.y,
    1-assist_event.receiver_coordinates.x, 1-assist_event.receiver_coordinates.y,
    lw=5, transparent=True, comet=True, cmap='Blues', zorder=1.2,
    label='Pass leading to shot', ax=axs['pitch']
)
# Plot the shot
pitch.scatter(
    1-assist_event.receiver_coordinates.x, 1-assist_event.receiver_coordinates.y,
    s=600, marker="football", zorder=1.3, label='Shot', ax=axs['pitch']
)

# Add a legend and title
legend = axs['pitch'].legend(loc='lower right', labelspacing=1.5)
for text in legend.get_texts():
    text.set_fontsize(10)
    text.set_va('center')

# Add a title
axs['title'].text(
    0.5, 0.5,
    f'{goal_event.time} - Goal by {goal_event.player} ({goal_event.team})',
    va='center', ha='center', color='black', fontsize=12
)

Tracking data

Unlike event data, which focuses on on-the-ball actions, tracking data provides continuous spatial information about all players and the ball. Below we load the tracking data of the same game.

1
2
3
4
5
6
from kloppy import sportec

tracking_dataset = sportec.load_open_tracking_data(
    match_id="J03WN1",
    limit=30000  # optional: for efficiency, we only load the first 30 000 frames
)

This will create a TrackingDataset, which contains a sequence of Frame entities.

1
2
3
>>> first_frame = tracking_dataset[0]
>>> print(first_frame)
<Frame frame_id='10000' time='P1T00:00'>

Each frame has a .ball_coordinates attribute that stores the coordinates of the ball and a .players_coordinates attribute that stores the coordinates of each player.

1
2
3
4
>>> print(f"Ball coordinates: (x={first_frame.ball_coordinates.x:.2f}, y={first_frame.ball_coordinates.y:.2f})")
>>> for player, coordinates in first_frame.players_coordinates.items():
...     print(f"{player} ({player.team}): (x={coordinates.x:.2f}, y={coordinates.y:.2f})")
Ball coordinates: (x=0.51, y=0.51)

A tracking data frame can provide useful context to an event as it shows the locations of all off-the-ball players. For example for a pass event, it can show which alternative passing options a player had. Unfortunately, matching the right tracking frame to an event can be challenging as recorded timestamps in event data are not always very precise. Luckily, Sportec has already done this matching for all shot events. Let's revisit the opening goal that we looked at earlier and see what additional context the tracking data can provide.

Event-tracking synchronization

Implementing automated event to tracking data synchronization is on kloppy's roadmap. See #61.

1
2
3
# Match the goal event with its corresponding tracking frame
matched_frame_idx = goal_event.raw_event["CalculatedFrame"]
goal_frame = tracking_dataset.get_record_by_id(int(matched_frame_idx))

With mplsoccer we can plot the frame.

2025-05-23T16:26:57.731558 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/

import matplotlib.pyplot as plt
from mplsoccer import VerticalPitch
from kloppy.domain import PositionType

# Setup the pitch
pitch = VerticalPitch(
    half=True, goal_type='box', pad_bottom=-0.2,
    line_color="#cfcfcf",
    line_zorder=1,
    pitch_type="metricasports",
    pitch_length=event_dataset.metadata.pitch_dimensions.pitch_length,
    pitch_width=event_dataset.metadata.pitch_dimensions.pitch_width,
)

# We will use mplsoccer's grid function to plot a pitch with a title axis.
fig, axs = pitch.grid(
    figheight=4, endnote_height=0,  # no endnote
    title_height=0.1, title_space=0.02,
    # Turn off the endnote/title axis. I usually do this after
    # I am happy with the chart layout and text placement
    axis=False,
    grid_height=0.83
)

# Plot the players
goal_scorer = goal_event.player
coordinates = {
  "shooter": {"x": [], "y": []},
  "attacker": {"x": [], "y": []},
  "defender": {"x": [], "y": []},
  "goalkeeper": {"x": [], "y": []},
}
for player, player_coordinates in goal_frame.players_coordinates.items():
    if player == goal_scorer:
        coordinates["shooter"]["x"].append(1-player_coordinates.x)
        coordinates["shooter"]["y"].append(1-player_coordinates.y)
    elif player.starting_position == PositionType.Goalkeeper:
        coordinates["goalkeeper"]["x"].append(1-player_coordinates.x)
        coordinates["goalkeeper"]["y"].append(1-player_coordinates.y)
    elif player.team == goal_scorer.team:
        coordinates["attacker"]["x"].append(1-player_coordinates.x)
        coordinates["attacker"]["y"].append(1-player_coordinates.y)
    else:
        coordinates["defender"]["x"].append(1-player_coordinates.x)
        coordinates["defender"]["y"].append(1-player_coordinates.y)

    # plot the jersey numbers
    pitch.annotate(
        player.jersey_no, (1-player_coordinates.x, 1-player_coordinates.y),
        va='center', ha='center', color='white',
        fontsize=10, ax=axs['pitch']
    )

# Plot the angle to the goal
pitch.goal_angle(coordinates["shooter"]["x"], coordinates["shooter"]["y"] , alpha=0.2, zorder=1.1, color='#cb5a4c', goal='right', ax=axs['pitch'])

# Plot the player coordinates
pitch.scatter(coordinates["shooter"]["x"], coordinates["shooter"]["y"], s=300, marker="football", label='Shooter', ax=axs['pitch'])
pitch.scatter(coordinates["attacker"]["x"], coordinates["attacker"]["y"], s=300, c='#727cce', label='Attacker', ax=axs['pitch'])
pitch.scatter(coordinates["defender"]["x"], coordinates["defender"]["y"], s=300, c='#5ba965', label='Defender', ax=axs['pitch'])
pitch.scatter(coordinates["goalkeeper"]["x"], coordinates["goalkeeper"]["y"], s=300, c='#c15ca5', label='Goalkeeper', ax=axs['pitch'])

# Add a legend
legend = axs['pitch'].legend(loc='lower left', labelspacing=1.5)
for text in legend.get_texts():
    text.set_fontsize(10)
    text.set_va('center')

# Add a title
axs['title'].text(
    0.5, 0.5,
    f'{goal_event.time} - Goal by {goal_event.player} ({goal_event.team})',
    va='center', ha='center', color='black', fontsize=12
)

Metadata

One of the main benefits of working with kloppy is that it loads metadata with each (event and tracking) dataset and makes it available in the dataset's .metadata property.

metadata = event_dataset.metadata

This metadata includes teams (name, ground, tactical formation, and provider ID) and players (name, jersey number, position, and provider ID). By default, the teams are stored in metadata.teams as a tuple where the first item is the home team and the second one is the away team.

1
2
3
>>> home_team, away_team = metadata.teams
>>> print(f"{home_team} vs {away_team}")
VfL Bochum 1848 vs Bayer 04 Leverkusen

From each Team entity, you can then retrieve the line-up as a list of Player entities.

home_team_players = []
for player in home_team.players:
    position = (
        player.starting_position.code
        if player.starting_position is not None
        else 'SUB'
    )
    description = f"{position}:{player} (#{player.jersey_no})"
    home_team_players.append(description)

print(home_team_players)
['GK:Manuel Riemann (#1)', 'RW:S. Zoller (#9)', 'UNK:Cristian Gamboa (#2)', 'UNK:M. Esser (#21)', 'LDM:K. Stöger (#7)', 'ST:P. Hofmann (#33)', 'LB:D. Heintz (#30)', 'RDM:A. Losilla (#8)', 'UNK:Danilo Soares (#3)', 'RCB:I. Ordets (#20)', 'UNK:G. Holtmann (#17)', 'RB:S. Janko (#23)', 'RW:T. Asano (#11)', 'CAM:P. Förster (#10)', 'LW:C. Antwi-Adjei (#22)', 'LB:K. Schlotterbeck (#31)', 'LDM:Vasileios Lampropoulos (#24)', 'ST:M. Broschinski (#29)', 'LCB:E. Mašović (#4)', 'LDM:P. Osterhage (#6)']

To select individual players, you can use the .get_player_by_id(), .get_player_by_jersey_number() or .get_player_by_position() methods. Below, we select Florian Wirtz by his Sportec ID ("DFL-OBJ-002GBK").

1
2
3
>>> player = away_team.get_player_by_id("DFL-OBJ-002GBK")
>>> print(player.name)
Florian Wirtz

The Team and Player entities also contain the magic methods to use those keys in dictionaries or use them in sets. This makes it easy to do some calculations, and show the results without mapping the player_id to a name.

from collections import defaultdict

passes_per_player = defaultdict(list)
for event in event_dataset.find_all("pass"):
    passes_per_player[event.player].append(event)

print("\n".join(
    f"{player} has {len(passes)} passes"
    for player, passes in passes_per_player.items()
))
K. Stöger has 58 passes
Manuel Riemann has 38 passes
P. Förster has 22 passes
Florian Wirtz has 56 passes
M. Diaby has 20 passes
N. Amiri has 33 passes
A. Adli has 2 passes
T. Asano has 18 passes
M. Bakker has 44 passes
P. Hofmann has 27 passes
D. Heintz has 17 passes
Jeremie Frimpong has 30 passes
Exequiel Palacios has 56 passes
S. Janko has 32 passes
L. Hrádecký has 28 passes
C. Antwi-Adjei has 39 passes
Edmond Tapsoba has 46 passes
I. Ordets has 33 passes
A. Losilla has 25 passes
E. Mašović has 30 passes
Jonathan Tah has 30 passes
S. Azmoun has 21 passes
P. Osterhage has 7 passes
K. Schlotterbeck has 6 passes
A. Hložek has 3 passes
M. Broschinski has 2 passes
S. Zoller has 5 passes

The metadata contains much more than the players and teams. Later in this quick start guide, we will come across some more metadata attributes. The Reference Guide gives a complete overview of everything that is available.

Filtering data

Oftentimes, not all data in a match is relevant. The goal of the analysis might be to investigate a certain time window, set of events, game phase, or tactical pattern.

Selecting events or frames

To select a subset of events or frames, kloppy provides the filter, find and find_all methods. We've already introduced the find and find_all methods above for finding events. The filter method works similarly, the only difference being that it returns a new dataset while the other two methods return a list of events or frames. With these methods we can easily create a dataset that only contains a specific type of event.

# Create a new dataset which contains all goals
goals_dataset = event_dataset.filter('shot.goal')

We can do slightly more complicated things by providing a (lambda) function. This works for both event data and tracking datasets.

1
2
3
4
5
# Create a new dataset with all frames where the ball is in the final third
pitch_max_x = tracking_dataset.metadata.pitch_dimensions.x_dim.max
f3_dataset = tracking_dataset.filter(
    lambda frame: frame.ball_coordinates.x > (2 / 3) * pitch_max_x
)

Pattern matching

For finding patterns in a game (that is, groups of events), you can use kloppy's event_pattern_matching module. This module implements a versatile domain-specific language for finding patterns in event data, inspired by regular expressions. We won't get into detail here but rather show how it can be used to create movement chains to illustrate its versatility.

Movement chains describe the pattern of four consecutive player involvements in an uninterrupted passage of play by displaying the locations of the first touches of the players involved, where a player can be involved more than once within the chain. In kloppy, you can define this pattern as follows:

from kloppy import event_pattern_matching as pm

pattern = (
    # match all successful passes
    pm.match_pass(
        success=True,
        capture="first_touch"
    )
    # ... that are followed by 3 successful passes by the same team
    + pm.match_pass(
        success=True,
        team=pm.same_as("first_touch.team"),
    ) * 3
    # ... ending with a shot by the same team
    + pm.match_shot(
        team=pm.same_as("first_touch.team")
    )
)

Now, we can search for this pattern in an event dataset.

1
2
3
>>> shot_ending_chains = pm.search(event_dataset, pattern)
>>> print(f"Found {len(shot_ending_chains)} matches")
Found 2 matches

We've only found two matches, one for the home team and one for the away team. Let's take a closer look at the players involved in those shot-ending movement chains.

1
2
3
4
>>> for match in shot_ending_chains:
...     print(" -> ".join([e.player.name for e in match.events]))
Jonathan Tah -> Jeremie Frimpong -> Florian Wirtz -> Jeremie Frimpong -> M. Diaby
P. Förster -> A. Losilla -> P. Hofmann -> K. Stöger -> C. Antwi-Adjei

Transforming data

Apart from the data format and event definitions, another aspect that differs between data providers is how they represent coordinates. These differences can include where the origin of the pitch is placed (e.g., top-left, center, bottom-left), which direction the axes increase (left to right, top to bottom, etc.), and the units used (normalized values, metric dimensions, or imperial dimensions). As a result, even if two datasets describe the same event, the x and y positions may not be directly comparable without converting them into a shared reference frame.

Sportec even uses different coordinate systems for their event and tracking data. For event data, the origin is at the top left, while it is at the center of the pitch for tracking data. The direction of the y-axis is different too.




 
  
   
    
    2025-05-23T16:26:46.974862
    image/svg+xml
    
     
      Matplotlib v3.10.3, https://matplotlib.org/
     
    
   
  
 
 
  
 
 
  
   
  
  
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
     
      
      
       
        
       
       
      
     
     
      
      
       
      
     
    
    
     
      
      
       
        
        
       
       
       
       
      
     
     
      
      
       
       
       
      
     
    
   
   
    
     
      
      
       
      
     
     
      
      
       
      
     
    
    
     
      
      
       
        
        
       
       
       
      
     
     
      
      
       
       
      
     
    
   
   
    
    
   
   
    
    
   
   
    
    
     
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    
   
   
    
    
     
      
      
      
      
      
      
      
      
      
      
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    
   
   
    
     
    
    
     
    
   
   
    
    
     
      
     
     
    
   
   
    
    
     
      
     
     
    
   
  
  
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
     
      
      
       
        
        
        
       
       
       
       
       
       
      
     
     
      
      
       
       
       
       
       
      
     
    
    
     
      
      
       
       
       
       
      
     
     
      
      
       
       
       
       
      
     
    
   
   
    
     
      
      
       
        
        
       
       
       
       
      
     
     
      
      
       
       
       
      
     
    
    
     
      
      
       
       
      
     
     
      
      
       
       
      
     
    
   
   
    
    
   
   
    
    
   
   
    
    
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    
   
   
    
    
     
      
      
      
      
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    
   
   
    
     
    
   
   
    
    
     
    
   
   
    
    
     
    
   
  
 

To avoid issues with differences between coordinate systems, kloppy converts all data to a common default coordinate system when loading a dataset: the KloppyCoordinateSystem.

>>> print(event_dataset.metadata.coordinate_system)
<kloppy.domain.models.common.KloppyCoordinateSystem object at 0x7f02a249a9f0>

In this coordinate system the pitch is scaled to a unit square where the x-axis ranges from 0 (left touchline) to 1 (right touchline), and the y-axis ranges from 0 (bottom goal line) to 1 (top goal line). All spatial data are expressed relative to this 1×1 pitch.




 
  
   
    
    2025-05-23T16:26:47.077009
    image/svg+xml
    
     
      Matplotlib v3.10.3, https://matplotlib.org/
     
    
   
  
 
 
  
 
 
  
   
  
  
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
   
   
    
     
      
      
       
        
       
       
      
     
     
      
      
       
      
     
    
    
     
      
      
       
        
       
       
      
     
     
      
      
       
      
     
    
   
   
    
     
      
      
       
      
     
     
      
      
       
      
     
    
    
     
      
      
       
      
     
     
      
      
       
      
     
    
   
   
    
    
   
   
    
    
   
   
    
    
     
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    
   
   
    
    
     
      
      
      
      
      
     
     
     
     
     
     
     
    
   
   
    
     
    
    
     
    
   
   
    
    
     
      
     
     
    
   
   
    
    
     
      
     
     
    
   
  
 

You can convert from this normalized system to any supported provider format using the .transform(to_coordinate_system=...) method, allowing interoperability with other tools or datasets.

1
2
3
4
5
6
7
8
>>> from kloppy import sportec
>>> from kloppy.domain import Provider
>>> event_dataset = (
...     sportec.load_open_event_data(match_id="J03WN1")
...     .transform(to_coordinate_system=Provider.SPORTEC)
... )
>>> print(event_dataset.metadata.coordinate_system)
<kloppy.domain.models.common.SportecEventDataCoordinateSystem object at 0x7f02a232bf20>

Alternatively (and more efficiently) you can directly load the data in your preferred coordinate system by setting the coordinates parameter. For example, to load the data with Sportec's coordinate system:

1
2
3
4
5
6
7
8
>>> from kloppy import sportec
>>> from kloppy.domain import Provider
>>> event_dataset = sportec.load_open_event_data(
...     match_id="J03WN1",
...     coordinates=Provider.SPORTEC
... )
>>> print(event_dataset.metadata.coordinate_system)
<kloppy.domain.models.common.SportecEventDataCoordinateSystem object at 0x7f02a043e780>

Another aspect of how coordinates are represented is the orientation of the data. For this game, the default orientation setting is "away-home". This means, the away team plays from left to right in the first period. The home team plays from left to right in the second period.

>>> print(metadata.orientation)
Orientation.AWAY_HOME

This orientation reflects the actual playing direction, which switches at half-time. It aligns with how the match appears on broadcast footage, making it convenient when synchronizing tracking or event data with video.

However, for some types of analysis, it can be more convenient to normalize the orientation so that one team (usually the team of interest) always attacks in the same direction (e.g., left-to-right). One concrete example is creating a heatmap of a player's actions. Let’s look at an example where we visualize the locations of all Florian Wirtz' his passes, first without transforming the orientation.

2025-05-23T16:26:58.632540 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/

from mplsoccer import Pitch
from kloppy.domain import EventType

player = away_team.get_player_by_id("DFL-OBJ-002GBK")

player_events = event_dataset.filter(
    lambda event: event.event_type == EventType.PASS and event.player == player
)

def heatmap(xs, ys):
    pitch = Pitch(
        pitch_type=event_dataset.metadata.coordinate_system.to_mplsoccer(),
        line_zorder=2,
    )
    fig, ax = pitch.draw()
    ax.set_title(f"#{player.jersey_no} - {player.last_name} - {player.team.name}")
    pitch.kdeplot(xs, ys, ax=ax, cmap="YlOrRd", fill=True, levels=100)

xs = [event.coordinates.x for event in player_events if event.coordinates is not None]
ys = [event.coordinates.y for event in player_events if event.coordinates is not None]

heatmap(xs, ys)

The heatmap shows activity spread over the entire pitch. This is because teams switch directions at halftime, and the data reflects that change.

We can transform the data so that direction of all on-the-ball actions is aligned left-to-right. Therefore, we'll use the "action-executing-team" orientation.

1
2
3
>>> transformed_events = player_events.transform(to_orientation="ACTION_EXECUTING_TEAM")
>>> print(transformed_events.metadata.orientation)
Orientation.ACTION_EXECUTING_TEAM

Now, the heatmap makes a lot more sense.

2025-05-23T16:26:59.074058 image/svg+xml Matplotlib v3.10.3, https://matplotlib.org/

1
2
3
4
5
6
7
8
xs = [
    event.coordinates.x for event in transformed_events if event.coordinates is not None
]
ys = [
    event.coordinates.y for event in transformed_events if event.coordinates is not None
]

heatmap(xs, ys)

Exporting data

Until now, we've worked with kloppy's object oriented data model. This format is well-suited to preprocess the data. However, to do some actual analysis of the data, it can often be more convenient and efficient to use dataframes or SportsCode XML.

To a Polars/Pandas dataframe

kloppy allows you to export a dataset to a dataframe. Both Polars and Pandas are supported. You can use the following engines: polars, pandas, pandas[pyarrow].

Note

You'll first have to install Pandas or Polars.

Simply calling dataset.to_df() results in a default output, but we can modify how the resulting dataframe looks as shown in the code below.

df_shots_and_key_passes = (
    event_dataset
    # filter for shots
    .filter("shot")
    # put all shots on the same side of the pitch
    .transform(to_orientation="ACTION_EXECUTING_TEAM")
    # convert to dataframe
    .to_df(
        "player_id",
        lambda event: {
            "player_name": str(event.player),
            "is_goal": event.result.is_success,
        },
        "coordinates_*",
        key_pass=lambda event: str(event.prev("pass").player),
        team=lambda event: str(event.team),
        engine="pandas",
    )
)

player_id player_name is_goal coordinates_x coordinates_y key_pass team
DFL-OBJ-J013O2 S. Azmoun False 0.862571 0.405588 N. Amiri Bayer 04 Leverkusen
DFL-OBJ-002G02 M. Diaby False 0.904571 0.640147 N. Amiri Bayer 04 Leverkusen
DFL-OBJ-J013O2 S. Azmoun False 0.897905 0.418824 N. Amiri Bayer 04 Leverkusen
DFL-OBJ-002G8H Exequiel Palacios False 0.844476 0.712353 A. Adli Bayer 04 Leverkusen
DFL-OBJ-0027LO P. Förster True 0.940095 0.465441 A. Adli VfL Bochum 1848

To Sportscode XML

Sportscode XML is a format associated with Hudl Sportscode, a popular platform for video analysis in sports. It integrates video clips with detailed tagging of game events, making it ideal for coaches and analysts who need synchronized video and event data to dissect team and player performances.

To support this popular data format, kloppy provides a CodeDataset. You can use kloppy to load Sportscode XML files, but perhaps more interestingly, you can also generate these files from another dataset allowing you to automatically create playlists from event and/or tracking data that can be used by a video analyst. We will illustrate this by creating a playlist with all shots.

from datetime import timedelta

from kloppy.domain import Code, CodeDataset, EventType

code_dataset = (
    CodeDataset
    .from_dataset(
        event_dataset.filter("shot"),
        lambda event: Code(
            code_id=None,  # make it auto increment on write
            code=event.event_name,
            period=event.period,
            timestamp=max(timedelta(seconds=0), event.timestamp - timedelta(seconds=7)),  # start 7s before the shot
            end_timestamp=event.timestamp + timedelta(seconds=5),  # end 5s after the shot
            labels={
                'Player': str(event.player),
                'Team': str(event.team)
            },

            # in the future, the next two won't be needed anymore
            ball_owning_team=None,
            ball_state=None,
            statistics=None
        )
    )
)

You can now export the dataset to an XML file.

1
2
3
from kloppy import sportscode

sportscode.save(code_dataset, "playlist.xml")