{
"cells": [
{
"cell_type": "markdown",
"id": "eb7a4c5f",
"metadata": {},
"source": [
"# PFF FC\n",
"PFF FC released their broadcast tracking data from the FIFA Men's World Cup 2022. The datasets can be requested via [this link](https://www.blog.fc.pff.com/blog/pff-fc-release-2022-world-cup-data).\n",
"\n",
"An overview of matches and related match identifiers can be found in the [Game Id Overview Section](#pff-open-data-game-id-overview) below.\n",
"\n",
"## Key Data Points:\n",
"- Tracking Data: The tracking data is stored separately per game: `{game_id}.jsonl.bz2`\n",
"- Event Data: The event data for all games is stored in a single file: `events.json`\n",
"- Metadata: The metadata (home team, away team, date of the game, etc.) information. Each game's metadata is stored seperately as `{game_id}.json`.\n",
"- Rosters: The rosters contain information on the team sheets. Each game's roster is stored seperately as `{game_id}.json`.\n",
"\n",
"## Load local files\n",
"To load the tracking data as a TrackingDataset use the `load_tracking()` function from the `pff` module.\n",
"\n",
"Required parameters are:\n",
"- `meta_data`: Path containing metadata about the tracking data.\n",
"- `players_meta_data`: Path containing roster metadata, such as player details.\n",
"- `raw_data`: Path containing the raw tracking data.\n",
"\n",
"Optional parameters are:\n",
"- `coordinates`: The coordinate system to use for the tracking data (e.g., \"pff\").\n",
"- `sample_rate`: The sampling rate to downsample the data. If None, no downsampling is applied.\n",
"- `limit`: The maximum number of records to process. If None, all records are processed.\n",
"- `only_alive`: Whether to include only sequences when the ball is in play.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71e23535",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>period_id</th>\n",
" <th>timestamp</th>\n",
" <th>frame_id</th>\n",
" <th>ball_state</th>\n",
" <th>ball_owning_team_id</th>\n",
" <th>ball_x</th>\n",
" <th>ball_y</th>\n",
" <th>ball_z</th>\n",
" <th>ball_speed</th>\n",
" <th>10715_x</th>\n",
" <th>...</th>\n",
" <th>8058_d</th>\n",
" <th>8058_s</th>\n",
" <th>5097_x</th>\n",
" <th>5097_y</th>\n",
" <th>5097_d</th>\n",
" <th>5097_s</th>\n",
" <th>3878_x</th>\n",
" <th>3878_y</th>\n",
" <th>3878_d</th>\n",
" <th>3878_s</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0 days 00:00:00.000821</td>\n",
" <td>4630</td>\n",
" <td>alive</td>\n",
" <td>363</td>\n",
" <td>0.42</td>\n",
" <td>1.59</td>\n",
" <td>0.39</td>\n",
" <td>None</td>\n",
" <td>4.987</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0 days 00:00:00.034188</td>\n",
" <td>4631</td>\n",
" <td>alive</td>\n",
" <td>363</td>\n",
" <td>0.83</td>\n",
" <td>1.63</td>\n",
" <td>0.00</td>\n",
" <td>None</td>\n",
" <td>4.955</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>0 days 00:00:00.067555</td>\n",
" <td>4632</td>\n",
" <td>alive</td>\n",
" <td>363</td>\n",
" <td>1.23</td>\n",
" <td>1.66</td>\n",
" <td>0.00</td>\n",
" <td>None</td>\n",
" <td>4.923</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>0 days 00:00:00.100921</td>\n",
" <td>4633</td>\n",
" <td>alive</td>\n",
" <td>363</td>\n",
" <td>1.64</td>\n",
" <td>1.70</td>\n",
" <td>0.02</td>\n",
" <td>None</td>\n",
" <td>4.892</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>0 days 00:00:00.134288</td>\n",
" <td>4634</td>\n",
" <td>alive</td>\n",
" <td>363</td>\n",
" <td>2.04</td>\n",
" <td>1.73</td>\n",
" <td>0.02</td>\n",
" <td>None</td>\n",
" <td>4.861</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 149 columns</p>\n",
"</div>"
],
"text/plain": [
" period_id timestamp frame_id ball_state ball_owning_team_id \\\n",
"0 1 0 days 00:00:00.000821 4630 alive 363 \n",
"1 1 0 days 00:00:00.034188 4631 alive 363 \n",
"2 1 0 days 00:00:00.067555 4632 alive 363 \n",
"3 1 0 days 00:00:00.100921 4633 alive 363 \n",
"4 1 0 days 00:00:00.134288 4634 alive 363 \n",
"\n",
" ball_x ball_y ball_z ball_speed 10715_x ... 8058_d 8058_s 5097_x \\\n",
"0 0.42 1.59 0.39 None 4.987 ... None None NaN \n",
"1 0.83 1.63 0.00 None 4.955 ... None None NaN \n",
"2 1.23 1.66 0.00 None 4.923 ... None None NaN \n",
"3 1.64 1.70 0.02 None 4.892 ... None None NaN \n",
"4 2.04 1.73 0.02 None 4.861 ... None None NaN \n",
"\n",
" 5097_y 5097_d 5097_s 3878_x 3878_y 3878_d 3878_s \n",
"0 NaN None None NaN NaN None None \n",
"1 NaN None None NaN NaN None None \n",
"2 NaN None None NaN NaN None None \n",
"3 NaN None None NaN NaN None None \n",
"4 NaN None None NaN NaN None None \n",
"\n",
"[5 rows x 149 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from kloppy import pff\n",
"\n",
"dataset = pff.load_tracking(\n",
" meta_data=\"../../kloppy/tests/files/pff_metadata_10517.json\",\n",
" roster_meta_data=\"../../kloppy/tests/files/pff_rosters_10517.json\",\n",
" raw_data=\"../../kloppy/tests/files/pff_10517.jsonl.bz2\",\n",
" # Optional Parameters\n",
" coordinates=\"pff\",\n",
" sample_rate=None,\n",
" limit=None,\n",
")\n",
"\n",
"dataset.to_df().head()"
]
},
{
"cell_type": "markdown",
"id": "9683f074",
"metadata": {},
"source": [
"### PFF Open World Cup 2022 Data Game Id Overview\n",
"\n",
"| game_id | home_team | away_team | date |\n",
"|-----------:|:--------------|:--------------|:--------------------|\n",
"| 10517 | Argentina | France | 2022-12-18T15:00:00 |\n",
"| 10516 | Croatia | Morocco | 2022-12-17T15:00:00 |\n",
"| 10515 | France | Morocco | 2022-12-14T19:00:00 |\n",
"| 10514 | Argentina | Croatia | 2022-12-13T19:00:00 |\n",
"| 10513 | England | France | 2022-12-10T19:00:00 |\n",
"| 10512 | Morocco | Portugal | 2022-12-10T15:00:00 |\n",
"| 10511 | Netherlands | Argentina | 2022-12-09T19:00:00 |\n",
"| 10510 | Croatia | Brazil | 2022-12-09T15:00:00 |\n",
"| 10509 | Portugal | Switzerland | 2022-12-06T19:00:00 |\n",
"| 10508 | Morocco | Spain | 2022-12-06T15:00:00 |\n",
"| 10507 | Brazil | South Korea | 2022-12-05T19:00:00 |\n",
"| 10506 | Japan | Croatia | 2022-12-05T15:00:00 |\n",
"| 10505 | England | Senegal | 2022-12-04T19:00:00 |\n",
"| 10504 | France | Poland | 2022-12-04T15:00:00 |\n",
"| 10503 | Argentina | Australia | 2022-12-03T19:00:00 |\n",
"| 10502 | Netherlands | United States | 2022-12-03T15:00:00 |\n",
"| 3859 | Cameroon | Brazil | 2022-12-02T19:00:00 |\n",
"| 3858 | Serbia | Switzerland | 2022-12-02T19:00:00 |\n",
"| 3857 | South Korea | Portugal | 2022-12-02T15:00:00 |\n",
"| 3856 | Ghana | Uruguay | 2022-12-02T15:00:00 |\n",
"| 3855 | Costa Rica | Germany | 2022-12-01T19:00:00 |\n",
"| 3854 | Japan | Spain | 2022-12-01T19:00:00 |\n",
"| 3852 | Croatia | Belgium | 2022-12-01T15:00:00 |\n",
"| 3853 | Canada | Morocco | 2022-12-01T15:00:00 |\n",
"| 3851 | Saudi Arabia | Mexico | 2022-11-30T19:00:00 |\n",
"| 3850 | Poland | Argentina | 2022-11-30T19:00:00 |\n",
"| 3849 | Tunisia | France | 2022-11-30T15:00:00 |\n",
"| 3848 | Australia | Denmark | 2022-11-30T15:00:00 |\n",
"| 3847 | Iran | United States | 2022-11-29T19:00:00 |\n",
"| 3846 | Wales | England | 2022-11-29T19:00:00 |\n",
"| 3845 | Netherlands | Qatar | 2022-11-29T15:00:00 |\n",
"| 3844 | Ecuador | Senegal | 2022-11-29T15:00:00 |\n",
"| 3843 | Portugal | Uruguay | 2022-11-28T19:00:00 |\n",
"| 3842 | Brazil | Switzerland | 2022-11-28T16:00:00 |\n",
"| 3841 | South Korea | Ghana | 2022-11-28T13:00:00 |\n",
"| 3840 | Cameroon | Serbia | 2022-11-28T10:00:00 |\n",
"| 3839 | Spain | Germany | 2022-11-27T19:00:00 |\n",
"| 3838 | Croatia | Canada | 2022-11-27T16:00:00 |\n",
"| 3837 | Belgium | Morocco | 2022-11-27T13:00:00 |\n",
"| 3836 | Japan | Costa Rica | 2022-11-27T10:00:00 |\n",
"| 3835 | Argentina | Mexico | 2022-11-26T19:00:00 |\n",
"| 3834 | France | Denmark | 2022-11-26T16:00:00 |\n",
"| 3833 | Poland | Saudi Arabia | 2022-11-26T13:00:00 |\n",
"| 3832 | Tunisia | Australia | 2022-11-26T10:00:00 |\n",
"| 3831 | England | United States | 2022-11-25T19:00:00 |\n",
"| 3830 | Netherlands | Ecuador | 2022-11-25T16:00:00 |\n",
"| 3829 | Qatar | Senegal | 2022-11-25T13:00:00 |\n",
"| 3828 | Wales | Iran | 2022-11-25T10:00:00 |\n",
"| 3827 | Brazil | Serbia | 2022-11-24T19:00:00 |\n",
"| 3826 | Portugal | Ghana | 2022-11-24T16:00:00 |\n",
"| 3825 | Uruguay | South Korea | 2022-11-24T13:00:00 |\n",
"| 3824 | Switzerland | Cameroon | 2022-11-24T10:00:00 |\n",
"| 3823 | Belgium | Canada | 2022-11-23T19:00:00 |\n",
"| 3822 | Spain | Costa Rica | 2022-11-23T16:00:00 |\n",
"| 3821 | Germany | Japan | 2022-11-23T13:00:00 |\n",
"| 3820 | Morocco | Croatia | 2022-11-23T10:00:00 |\n",
"| 3819 | France | Australia | 2022-11-22T19:00:00 |\n",
"| 3818 | Mexico | Poland | 2022-11-22T16:00:00 |\n",
"| 3817 | Denmark | Tunisia | 2022-11-22T13:00:00 |\n",
"| 3816 | Argentina | Saudi Arabia | 2022-11-22T10:00:00 |\n",
"| 3815 | United States | Wales | 2022-11-21T19:00:00 |\n",
"| 3812 | Senegal | Netherlands | 2022-11-21T16:00:00 |\n",
"| 3813 | England | Iran | 2022-11-21T13:00:00 |\n",
"| 3814 | Qatar | Ecuador | 2022-11-20T16:00:00 |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}