Skip to content

Exporting to a dataframe

Kloppy datasets can be easily exported to either Pandas or Polars dataframes. This functionality connects kloppy with the broader Python data analytics ecosystem, enabling you to:

  • Explore and manipulate tracking or event data using Pandas or Polars.
  • Export data to formats like CSV, Excel, or Parquet.
  • Perform advanced feature engineering on-the-fly.
  • Integrate with machine learning and visualization libraries that require tabular input.

Basic usage

Kloppy represents data using structured Python objects (like TrackingDataset and EventDataset). The to_df() method flattens these objects into a tabular form.

df = dataset.to_df()

The default output columns of the DataFrame depend on the type of dataset:

Event data

For an EventDataset, the default columns include:

Column Description
event_id Unique identifier of the event
event_type Type of the event (e.g., pass, shot)
period_id Match period
timestamp Start time of the event
end_timestamp End time of the event (if available)
team_id ID of the team performing the event
player_id ID of the player performing the event
result Result of the action (e.g., COMPLETE)
success Boolean indicating if the event was completed successfully
coordinates_x/y Location of the event on the pitch
ball_state Current state of the game
ball_owning_team Which team owns the ball

Other columns are added depending on the event types and qualifiers of the events in the dataset.

Example:

event_id event_type period_id timestamp end_timestamp ball_state ball_owning_team team_id player_id coordinates_x coordinates_y end_coordinates_x end_coordinates_y receiver_player_id set_piece_type result success body_part_type card_type
18226900000006 PASS 1 0 days 00:00:00 None alive None DFL-CLU-00000S DFL-OBJ-000172 0.500000 0.500000 0.745238 0.501471 DFL-OBJ-0001I0 KICK_OFF COMPLETE True None None
18226900000007 PASS 1 0 days 00:00:04.110000 None alive None DFL-CLU-00000S DFL-OBJ-0001I0 0.745238 0.501471 0.275905 0.914853 None None COMPLETE True None None
18226900000008 GENERIC:TacklingGame 1 0 days 00:00:06.365000 None alive None None None 0.275905 0.914853 NaN NaN None None None None None None

Tracking data

For a TrackingDataset, the output columns include:

Column Description
frame_id Frame number
period_id Match period
timestamp Frame timestamp
ball_x, ball_y, ball_z, ball_speed Ball position and speed
<player_id>_x, <player_id>_y, <player_id>_d, <player_id>_s Player coordinates, distance (since previous frame), and speed
ball_state Current state of the ball
ball_owning_team Which team owns the ball

Example:

period_id timestamp frame_id ball_state ball_owning_team_id ball_x ball_y ball_z ball_speed
1 0 days 00:00:00 10000 alive DFL-CLU-00000S 0.505810 0.505147 0.01 6.84
1 0 days 00:00:00.040000 10001 alive DFL-CLU-00000S 0.515333 0.507794 0.03 6.84
1 0 days 00:00:00.080000 10002 alive DFL-CLU-00000S 0.524762 0.510441 0.04 6.80

Code data

For a CodeDataset, the output columns include:

Column Description
code_id Code number
period_id Match period
timestamp Start timestamp
end_timestamp End timestamp
code Name of the code

Furthermore, one column is added for each label used in the dataset.

Example:

code_id period_id timestamp end_timestamp code Player Team
None 1 0 days 00:00:45.374000 0 days 00:00:57.374000 shot S. Azmoun Bayer 04 Leverkusen
None 1 0 days 00:00:50.383000 0 days 00:01:02.383000 shot M. Diaby Bayer 04 Leverkusen
None 1 0 days 00:01:43.052000 0 days 00:01:55.052000 shot S. Azmoun Bayer 04 Leverkusen

Selecting output columns

You can control which attributes are included in the output using arguments in .to_df(). Wildcard patterns (*) are supported to match multiple fields:

df = event_dataset.to_df("event_type", "team", "coordinates_*")

event_type team coordinates_x coordinates_y
PASS VfL Bochum 1848 0.5 0.5

This lets you include only the data you need, making downstream processing more efficient. The pattern is matched against all default attributes provided by the internal transformer.

Adding metadata as columns

You can inject constant metadata into your DataFrame by passing keyword arguments:

df = event_dataset.to_df("*", match_id="match_1234", competition="Premier League")

event_id event_type period_id timestamp end_timestamp ball_state ball_owning_team team_id player_id coordinates_x coordinates_y end_coordinates_x end_coordinates_y receiver_player_id set_piece_type result success match_id competition body_part_type card_type
18226900000006 PASS 1 0 days None alive None DFL-CLU-00000S DFL-OBJ-000172 0.5 0.5 0.745238 0.501471 DFL-OBJ-0001I0 KICK_OFF COMPLETE True match_1234 Premier League None None

These keyword arguments will be added as constant columns across all rows. This is useful for adding identifiers or context to the exported data.

Attribute transformers

Attribute Transformers let you derive new attributes on the fly during the conversion process. Kloppy includes a limited set of built-in transformers:

  • DistanceToGoalTransformer: Compute the distance to the goal.
  • DistanceToOwnGoalTransformer: Compute the distance to the own goal.
  • AngleToGoalTransformer: Compute the angle between the current location and the center of the goal.

Example usage:

1
2
3
4
5
6
7
from kloppy.domain.services.transformers.attribute import DistanceToGoalTransformer
from kloppy.domain import Orientation

df = (
    event_dataset
    .to_df("event_type", "team", "coordinates_*", DistanceToGoalTransformer())
)

event_type team coordinates_x coordinates_y distance_to_goal
PASS VfL Bochum 1848 0.5 0.5 0.5

You can also define your own custom transformer. A transformer is a function that takes an Event (or Frame) and returns a dictionary of derived values:

1
2
3
4
5
6
7
def my_transformer(event):
    return {
        "is_shot": event.event_type.name == "SHOT",
        "x": event.coordinates.x if event.coordinates else None
    }

df = event_dataset.to_df("event_type", "team", "coordinates_*", my_transformer)

event_type team coordinates_x coordinates_y is_shot x
PASS VfL Bochum 1848 0.5 0.5 False 0.5

Alternatively, you can add named attributes using keyword arguments and pass a callable that returns any single value:

1
2
3
4
df = event_dataset.to_df(
    "event_type", "team", "coordinates_*", 
    is_pass=lambda event: event.event_type.name == "PASS"
)

event_type team coordinates_x coordinates_y is_pass
PASS VfL Bochum 1848 0.5 0.5 True

This flexibility allows you to calculate exactly the attributes you need at export time.

Combine it all

Here's an example that shows everything working together:

from kloppy import sportec
from kloppy.domain.services.transformers.attribute import DistanceToGoalTransformer

event_dataset = sportec.load_open_event_data(match_id="J03WN1")

df = event_dataset.to_df(
    "event_type", "team", "coordinates_*",
    DistanceToGoalTransformer(),
    match_id="match_5678",
    is_pass=lambda e: e.event_type.name == "PASS"
)

event_type team coordinates_x coordinates_y distance_to_goal match_id is_pass
PASS VfL Bochum 1848 0.500000 0.500000 0.500000 match_5678 True
PASS VfL Bochum 1848 0.745238 0.501471 0.254766 match_5678 True
GENERIC:TacklingGame None 0.275905 0.914853 0.834516 match_5678 False