Exporting to a dataframe
Kloppy datasets can be easily exported to either Pandas or Polars dataframes. This functionality connects kloppy with the broader Python data analytics ecosystem, enabling you to:
- Explore and manipulate tracking or event data using Pandas or Polars.
- Export data to formats like CSV, Excel, or Parquet.
- Perform advanced feature engineering on-the-fly.
- Integrate with machine learning and visualization libraries that require tabular input.
Basic usage
Kloppy represents data using structured Python objects (like TrackingDataset
and EventDataset
). The to_df()
method flattens these objects into a tabular form.
The default output columns of the DataFrame depend on the type of dataset:
Event data
For an EventDataset
, the default columns include:
Column | Description |
---|---|
event_id | Unique identifier of the event |
event_type | Type of the event (e.g., pass, shot) |
period_id | Match period |
timestamp | Start time of the event |
end_timestamp | End time of the event (if available) |
team_id | ID of the team performing the event |
player_id | ID of the player performing the event |
result | Result of the action (e.g., COMPLETE) |
success | Boolean indicating if the event was completed successfully |
coordinates_x/y | Location of the event on the pitch |
ball_state | Current state of the game |
ball_owning_team | Which team owns the ball |
Other columns are added depending on the event types and qualifiers of the events in the dataset.
Example:
event_id | event_type | period_id | timestamp | end_timestamp | ball_state | ball_owning_team | team_id | player_id | coordinates_x | coordinates_y | end_coordinates_x | end_coordinates_y | receiver_player_id | set_piece_type | result | success | body_part_type | card_type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18226900000006 | PASS | 1 | 0 days 00:00:00 | None | alive | None | DFL-CLU-00000S | DFL-OBJ-000172 | 0.500000 | 0.500000 | 0.745238 | 0.501471 | DFL-OBJ-0001I0 | KICK_OFF | COMPLETE | True | None | None |
18226900000007 | PASS | 1 | 0 days 00:00:04.110000 | None | alive | None | DFL-CLU-00000S | DFL-OBJ-0001I0 | 0.745238 | 0.501471 | 0.275905 | 0.914853 | None | None | COMPLETE | True | None | None |
18226900000008 | GENERIC:TacklingGame | 1 | 0 days 00:00:06.365000 | None | alive | None | None | None | 0.275905 | 0.914853 | NaN | NaN | None | None | None | None | None | None |
Tracking data
For a TrackingDataset
, the output columns include:
Column | Description |
---|---|
frame_id | Frame number |
period_id | Match period |
timestamp | Frame timestamp |
ball_x, ball_y, ball_z, ball_speed | Ball position and speed |
<player_id>_x, <player_id>_y, <player_id>_d, <player_id>_s | Player coordinates, distance (since previous frame), and speed |
ball_state | Current state of the ball |
ball_owning_team | Which team owns the ball |
Example:
period_id | timestamp | frame_id | ball_state | ball_owning_team_id | ball_x | ball_y | ball_z | ball_speed |
---|---|---|---|---|---|---|---|---|
1 | 0 days 00:00:00 | 10000 | alive | DFL-CLU-00000S | 0.505810 | 0.505147 | 0.01 | 6.84 |
1 | 0 days 00:00:00.040000 | 10001 | alive | DFL-CLU-00000S | 0.515333 | 0.507794 | 0.03 | 6.84 |
1 | 0 days 00:00:00.080000 | 10002 | alive | DFL-CLU-00000S | 0.524762 | 0.510441 | 0.04 | 6.80 |
Code data
For a CodeDataset
, the output columns include:
Column | Description |
---|---|
code_id | Code number |
period_id | Match period |
timestamp | Start timestamp |
end_timestamp | End timestamp |
code | Name of the code |
Furthermore, one column is added for each label used in the dataset.
Example:
code_id | period_id | timestamp | end_timestamp | code | Player | Team |
---|---|---|---|---|---|---|
None | 1 | 0 days 00:00:45.374000 | 0 days 00:00:57.374000 | shot | S. Azmoun | Bayer 04 Leverkusen |
None | 1 | 0 days 00:00:50.383000 | 0 days 00:01:02.383000 | shot | M. Diaby | Bayer 04 Leverkusen |
None | 1 | 0 days 00:01:43.052000 | 0 days 00:01:55.052000 | shot | S. Azmoun | Bayer 04 Leverkusen |
Selecting output columns
You can control which attributes are included in the output using arguments in .to_df()
.
Wildcard patterns (*
) are supported to match multiple fields:
event_type | team | coordinates_x | coordinates_y |
---|---|---|---|
PASS | VfL Bochum 1848 | 0.5 | 0.5 |
This lets you include only the data you need, making downstream processing more efficient. The pattern is matched against all default attributes provided by the internal transformer.
Adding metadata as columns
You can inject constant metadata into your DataFrame by passing keyword arguments:
event_id | event_type | period_id | timestamp | end_timestamp | ball_state | ball_owning_team | team_id | player_id | coordinates_x | coordinates_y | end_coordinates_x | end_coordinates_y | receiver_player_id | set_piece_type | result | success | match_id | competition | body_part_type | card_type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18226900000006 | PASS | 1 | 0 days | None | alive | None | DFL-CLU-00000S | DFL-OBJ-000172 | 0.5 | 0.5 | 0.745238 | 0.501471 | DFL-OBJ-0001I0 | KICK_OFF | COMPLETE | True | match_1234 | Premier League | None | None |
These keyword arguments will be added as constant columns across all rows. This is useful for adding identifiers or context to the exported data.
Attribute transformers
Attribute Transformers let you derive new attributes on the fly during the conversion process. Kloppy includes a limited set of built-in transformers:
DistanceToGoalTransformer
: Compute the distance to the goal.DistanceToOwnGoalTransformer
: Compute the distance to the own goal.AngleToGoalTransformer
: Compute the angle between the current location and the center of the goal.
Example usage:
event_type | team | coordinates_x | coordinates_y | distance_to_goal |
---|---|---|---|---|
PASS | VfL Bochum 1848 | 0.5 | 0.5 | 0.5 |
You can also define your own custom transformer. A transformer is a function that takes an Event (or Frame) and returns a dictionary of derived values:
event_type | team | coordinates_x | coordinates_y | is_shot | x |
---|---|---|---|---|---|
PASS | VfL Bochum 1848 | 0.5 | 0.5 | False | 0.5 |
Alternatively, you can add named attributes using keyword arguments and pass a callable that returns any single value:
event_type | team | coordinates_x | coordinates_y | is_pass |
---|---|---|---|---|
PASS | VfL Bochum 1848 | 0.5 | 0.5 | True |
This flexibility allows you to calculate exactly the attributes you need at export time.
Combine it all
Here's an example that shows everything working together:
event_type | team | coordinates_x | coordinates_y | distance_to_goal | match_id | is_pass |
---|---|---|---|---|---|---|
PASS | VfL Bochum 1848 | 0.500000 | 0.500000 | 0.500000 | match_5678 | True |
PASS | VfL Bochum 1848 | 0.745238 | 0.501471 | 0.254766 | match_5678 | True |
GENERIC:TacklingGame | None | 0.275905 | 0.914853 | 0.834516 | match_5678 | False |