{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "eb7a4c5f",
   "metadata": {},
   "source": [
    "# PFF FC\n",
    "PFF FC released their broadcast tracking data from the FIFA Men's World Cup 2022. The datasets can be requested via [this link](https://www.blog.fc.pff.com/blog/pff-fc-release-2022-world-cup-data).\n",
    "\n",
    "An overview of matches and related match identifiers can be found in the [Game Id Overview Section](#pff-open-data-game-id-overview) below.\n",
    "\n",
    "## Key Data Points:\n",
    "- Tracking Data: The tracking data is stored separately per game: `{game_id}.jsonl.bz2`\n",
    "- Event Data: The event data for all games is stored in a single file: `events.json`\n",
    "- Metadata: The metadata (home team, away team, date of the game, etc.) information. Each game's metadata is stored seperately as `{game_id}.json`.\n",
    "- Rosters:  The rosters contain information on the team sheets. Each game's roster is stored seperately as `{game_id}.json`.\n",
    "\n",
    "## Load local files\n",
    "To load the tracking data as a TrackingDataset use the `load_tracking()` function from the `pff` module.\n",
    "\n",
    "Required parameters are:\n",
    "- `meta_data`: Path containing metadata about the tracking data.\n",
    "- `players_meta_data`: Path containing roster metadata, such as player details.\n",
    "- `raw_data`: Path containing the raw tracking data.\n",
    "\n",
    "Optional parameters are:\n",
    "- `coordinates`: The coordinate system to use for the tracking data (e.g., \"pff\").\n",
    "- `sample_rate`: The sampling rate to downsample the data. If None, no downsampling is applied.\n",
    "- `limit`: The maximum number of records to process. If None, all records are processed.\n",
    "- `only_alive`: Whether to include only sequences when the ball is in play.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71e23535",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>period_id</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>frame_id</th>\n",
       "      <th>ball_state</th>\n",
       "      <th>ball_owning_team_id</th>\n",
       "      <th>ball_x</th>\n",
       "      <th>ball_y</th>\n",
       "      <th>ball_z</th>\n",
       "      <th>ball_speed</th>\n",
       "      <th>10715_x</th>\n",
       "      <th>...</th>\n",
       "      <th>8058_d</th>\n",
       "      <th>8058_s</th>\n",
       "      <th>5097_x</th>\n",
       "      <th>5097_y</th>\n",
       "      <th>5097_d</th>\n",
       "      <th>5097_s</th>\n",
       "      <th>3878_x</th>\n",
       "      <th>3878_y</th>\n",
       "      <th>3878_d</th>\n",
       "      <th>3878_s</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0 days 00:00:00.000821</td>\n",
       "      <td>4630</td>\n",
       "      <td>alive</td>\n",
       "      <td>363</td>\n",
       "      <td>0.42</td>\n",
       "      <td>1.59</td>\n",
       "      <td>0.39</td>\n",
       "      <td>None</td>\n",
       "      <td>4.987</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0 days 00:00:00.034188</td>\n",
       "      <td>4631</td>\n",
       "      <td>alive</td>\n",
       "      <td>363</td>\n",
       "      <td>0.83</td>\n",
       "      <td>1.63</td>\n",
       "      <td>0.00</td>\n",
       "      <td>None</td>\n",
       "      <td>4.955</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0 days 00:00:00.067555</td>\n",
       "      <td>4632</td>\n",
       "      <td>alive</td>\n",
       "      <td>363</td>\n",
       "      <td>1.23</td>\n",
       "      <td>1.66</td>\n",
       "      <td>0.00</td>\n",
       "      <td>None</td>\n",
       "      <td>4.923</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0 days 00:00:00.100921</td>\n",
       "      <td>4633</td>\n",
       "      <td>alive</td>\n",
       "      <td>363</td>\n",
       "      <td>1.64</td>\n",
       "      <td>1.70</td>\n",
       "      <td>0.02</td>\n",
       "      <td>None</td>\n",
       "      <td>4.892</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0 days 00:00:00.134288</td>\n",
       "      <td>4634</td>\n",
       "      <td>alive</td>\n",
       "      <td>363</td>\n",
       "      <td>2.04</td>\n",
       "      <td>1.73</td>\n",
       "      <td>0.02</td>\n",
       "      <td>None</td>\n",
       "      <td>4.861</td>\n",
       "      <td>...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 149 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   period_id              timestamp  frame_id ball_state ball_owning_team_id  \\\n",
       "0          1 0 days 00:00:00.000821      4630      alive                 363   \n",
       "1          1 0 days 00:00:00.034188      4631      alive                 363   \n",
       "2          1 0 days 00:00:00.067555      4632      alive                 363   \n",
       "3          1 0 days 00:00:00.100921      4633      alive                 363   \n",
       "4          1 0 days 00:00:00.134288      4634      alive                 363   \n",
       "\n",
       "   ball_x  ball_y  ball_z ball_speed  10715_x  ...  8058_d 8058_s 5097_x  \\\n",
       "0    0.42    1.59    0.39       None    4.987  ...    None   None    NaN   \n",
       "1    0.83    1.63    0.00       None    4.955  ...    None   None    NaN   \n",
       "2    1.23    1.66    0.00       None    4.923  ...    None   None    NaN   \n",
       "3    1.64    1.70    0.02       None    4.892  ...    None   None    NaN   \n",
       "4    2.04    1.73    0.02       None    4.861  ...    None   None    NaN   \n",
       "\n",
       "   5097_y  5097_d 5097_s 3878_x  3878_y  3878_d 3878_s  \n",
       "0     NaN    None   None    NaN     NaN    None   None  \n",
       "1     NaN    None   None    NaN     NaN    None   None  \n",
       "2     NaN    None   None    NaN     NaN    None   None  \n",
       "3     NaN    None   None    NaN     NaN    None   None  \n",
       "4     NaN    None   None    NaN     NaN    None   None  \n",
       "\n",
       "[5 rows x 149 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from kloppy import pff\n",
    "\n",
    "dataset = pff.load_tracking(\n",
    "    meta_data=\"../../kloppy/tests/files/pff_metadata_10517.json\",\n",
    "    roster_meta_data=\"../../kloppy/tests/files/pff_rosters_10517.json\",\n",
    "    raw_data=\"../../kloppy/tests/files/pff_10517.jsonl.bz2\",\n",
    "    # Optional Parameters\n",
    "    coordinates=\"pff\",\n",
    "    sample_rate=None,\n",
    "    limit=None,\n",
    ")\n",
    "\n",
    "dataset.to_df().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9683f074",
   "metadata": {},
   "source": [
    "### PFF Open World Cup 2022 Data Game Id Overview\n",
    "\n",
    "|   game_id | home_team     | away_team     | date                |\n",
    "|-----------:|:--------------|:--------------|:--------------------|\n",
    "|      10517 | Argentina     | France        | 2022-12-18T15:00:00 |\n",
    "|      10516 | Croatia       | Morocco       | 2022-12-17T15:00:00 |\n",
    "|      10515 | France        | Morocco       | 2022-12-14T19:00:00 |\n",
    "|      10514 | Argentina     | Croatia       | 2022-12-13T19:00:00 |\n",
    "|      10513 | England       | France        | 2022-12-10T19:00:00 |\n",
    "|      10512 | Morocco       | Portugal      | 2022-12-10T15:00:00 |\n",
    "|      10511 | Netherlands   | Argentina     | 2022-12-09T19:00:00 |\n",
    "|      10510 | Croatia       | Brazil        | 2022-12-09T15:00:00 |\n",
    "|      10509 | Portugal      | Switzerland   | 2022-12-06T19:00:00 |\n",
    "|      10508 | Morocco       | Spain         | 2022-12-06T15:00:00 |\n",
    "|      10507 | Brazil        | South Korea   | 2022-12-05T19:00:00 |\n",
    "|      10506 | Japan         | Croatia       | 2022-12-05T15:00:00 |\n",
    "|      10505 | England       | Senegal       | 2022-12-04T19:00:00 |\n",
    "|      10504 | France        | Poland        | 2022-12-04T15:00:00 |\n",
    "|      10503 | Argentina     | Australia     | 2022-12-03T19:00:00 |\n",
    "|      10502 | Netherlands   | United States | 2022-12-03T15:00:00 |\n",
    "|       3859 | Cameroon      | Brazil        | 2022-12-02T19:00:00 |\n",
    "|       3858 | Serbia        | Switzerland   | 2022-12-02T19:00:00 |\n",
    "|       3857 | South Korea   | Portugal      | 2022-12-02T15:00:00 |\n",
    "|       3856 | Ghana         | Uruguay       | 2022-12-02T15:00:00 |\n",
    "|       3855 | Costa Rica    | Germany       | 2022-12-01T19:00:00 |\n",
    "|       3854 | Japan         | Spain         | 2022-12-01T19:00:00 |\n",
    "|       3852 | Croatia       | Belgium       | 2022-12-01T15:00:00 |\n",
    "|       3853 | Canada        | Morocco       | 2022-12-01T15:00:00 |\n",
    "|       3851 | Saudi Arabia  | Mexico        | 2022-11-30T19:00:00 |\n",
    "|       3850 | Poland        | Argentina     | 2022-11-30T19:00:00 |\n",
    "|       3849 | Tunisia       | France        | 2022-11-30T15:00:00 |\n",
    "|       3848 | Australia     | Denmark       | 2022-11-30T15:00:00 |\n",
    "|       3847 | Iran          | United States | 2022-11-29T19:00:00 |\n",
    "|       3846 | Wales         | England       | 2022-11-29T19:00:00 |\n",
    "|       3845 | Netherlands   | Qatar         | 2022-11-29T15:00:00 |\n",
    "|       3844 | Ecuador       | Senegal       | 2022-11-29T15:00:00 |\n",
    "|       3843 | Portugal      | Uruguay       | 2022-11-28T19:00:00 |\n",
    "|       3842 | Brazil        | Switzerland   | 2022-11-28T16:00:00 |\n",
    "|       3841 | South Korea   | Ghana         | 2022-11-28T13:00:00 |\n",
    "|       3840 | Cameroon      | Serbia        | 2022-11-28T10:00:00 |\n",
    "|       3839 | Spain         | Germany       | 2022-11-27T19:00:00 |\n",
    "|       3838 | Croatia       | Canada        | 2022-11-27T16:00:00 |\n",
    "|       3837 | Belgium       | Morocco       | 2022-11-27T13:00:00 |\n",
    "|       3836 | Japan         | Costa Rica    | 2022-11-27T10:00:00 |\n",
    "|       3835 | Argentina     | Mexico        | 2022-11-26T19:00:00 |\n",
    "|       3834 | France        | Denmark       | 2022-11-26T16:00:00 |\n",
    "|       3833 | Poland        | Saudi Arabia  | 2022-11-26T13:00:00 |\n",
    "|       3832 | Tunisia       | Australia     | 2022-11-26T10:00:00 |\n",
    "|       3831 | England       | United States | 2022-11-25T19:00:00 |\n",
    "|       3830 | Netherlands   | Ecuador       | 2022-11-25T16:00:00 |\n",
    "|       3829 | Qatar         | Senegal       | 2022-11-25T13:00:00 |\n",
    "|       3828 | Wales         | Iran          | 2022-11-25T10:00:00 |\n",
    "|       3827 | Brazil        | Serbia        | 2022-11-24T19:00:00 |\n",
    "|       3826 | Portugal      | Ghana         | 2022-11-24T16:00:00 |\n",
    "|       3825 | Uruguay       | South Korea   | 2022-11-24T13:00:00 |\n",
    "|       3824 | Switzerland   | Cameroon      | 2022-11-24T10:00:00 |\n",
    "|       3823 | Belgium       | Canada        | 2022-11-23T19:00:00 |\n",
    "|       3822 | Spain         | Costa Rica    | 2022-11-23T16:00:00 |\n",
    "|       3821 | Germany       | Japan         | 2022-11-23T13:00:00 |\n",
    "|       3820 | Morocco       | Croatia       | 2022-11-23T10:00:00 |\n",
    "|       3819 | France        | Australia     | 2022-11-22T19:00:00 |\n",
    "|       3818 | Mexico        | Poland        | 2022-11-22T16:00:00 |\n",
    "|       3817 | Denmark       | Tunisia       | 2022-11-22T13:00:00 |\n",
    "|       3816 | Argentina     | Saudi Arabia  | 2022-11-22T10:00:00 |\n",
    "|       3815 | United States | Wales         | 2022-11-21T19:00:00 |\n",
    "|       3812 | Senegal       | Netherlands   | 2022-11-21T16:00:00 |\n",
    "|       3813 | England       | Iran          | 2022-11-21T13:00:00 |\n",
    "|       3814 | Qatar         | Ecuador       | 2022-11-20T16:00:00 |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}