{ "cells": [ { "cell_type": "markdown", "id": "3a953d55", "metadata": {}, "source": [ "# Statsbomb\n", "\n", "- [Load local files](#load-local-files)\n", "- [Load remote open data files](#load-remote-open-data-files)\n", "- [Load remote files](#load-remote-files)\n" ] }, { "cell_type": "markdown", "id": "d2d34bd2", "metadata": {}, "source": [ "## Load local files\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "12ec8092", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>event_id</th>\n", " <th>event_type</th>\n", " <th>result</th>\n", " <th>success</th>\n", " <th>period_id</th>\n", " <th>timestamp</th>\n", " <th>end_timestamp</th>\n", " <th>ball_state</th>\n", " <th>ball_owning_team</th>\n", " <th>team_id</th>\n", " <th>player_id</th>\n", " <th>coordinates_x</th>\n", " <th>coordinates_y</th>\n", " <th>end_coordinates_x</th>\n", " <th>end_coordinates_y</th>\n", " <th>receiver_player_id</th>\n", " <th>set_piece_type</th>\n", " <th>body_part_type</th>\n", " <th>pass_type</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>bbc398f7-c784-4958-a504-37b583caf97a</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>0.878</td>\n", " <td>2.788504</td>\n", " <td>alive</td>\n", " <td>909</td>\n", " <td>909</td>\n", " <td>11086</td>\n", " <td>59.95</td>\n", " <td>39.95</td>\n", " <td>32.45</td>\n", " <td>28.75</td>\n", " <td>8963</td>\n", " <td>KICK_OFF</td>\n", " <td>RIGHT_FOOT</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>5c210f79-9714-44a6-b2ec-387f6a117b37</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>4.288</td>\n", " <td>6.764772</td>\n", " <td>alive</td>\n", " <td>909</td>\n", " <td>909</td>\n", " <td>8963</td>\n", " <td>36.15</td>\n", " <td>30.35</td>\n", " <td>70.65</td>\n", " <td>75.75</td>\n", " <td>8541</td>\n", " <td>NaN</td>\n", " <td>LEFT_FOOT</td>\n", " <td>LONG_BALL</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>8a3e6668-9680-4417-987e-8db0c6ce6a8b</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>12.163</td>\n", " <td>14.093230</td>\n", " <td>alive</td>\n", " <td>914</td>\n", " <td>914</td>\n", " <td>8286</td>\n", " <td>43.05</td>\n", " <td>0.05</td>\n", " <td>15.75</td>\n", " <td>7.45</td>\n", " <td>6954</td>\n", " <td>THROW_IN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>f8e61bb0-b618-4695-9ff9-eaa0584bdbfa</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>16.420</td>\n", " <td>18.242108</td>\n", " <td>alive</td>\n", " <td>914</td>\n", " <td>914</td>\n", " <td>6954</td>\n", " <td>3.25</td>\n", " <td>12.65</td>\n", " <td>7.85</td>\n", " <td>36.15</td>\n", " <td>7036</td>\n", " <td>NaN</td>\n", " <td>RIGHT_FOOT</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1d72ce76-31fd-43e0-a6b2-1f78c8a57a77</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>20.025</td>\n", " <td>21.071484</td>\n", " <td>alive</td>\n", " <td>914</td>\n", " <td>914</td>\n", " <td>7036</td>\n", " <td>9.05</td>\n", " <td>38.85</td>\n", " <td>19.65</td>\n", " <td>47.85</td>\n", " <td>7173</td>\n", " <td>NaN</td>\n", " <td>RIGHT_FOOT</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " event_id event_type result success \\\n", "0 bbc398f7-c784-4958-a504-37b583caf97a PASS COMPLETE True \n", "1 5c210f79-9714-44a6-b2ec-387f6a117b37 PASS COMPLETE True \n", "2 8a3e6668-9680-4417-987e-8db0c6ce6a8b PASS COMPLETE True \n", "3 f8e61bb0-b618-4695-9ff9-eaa0584bdbfa PASS COMPLETE True \n", "4 1d72ce76-31fd-43e0-a6b2-1f78c8a57a77 PASS COMPLETE True \n", "\n", " period_id timestamp end_timestamp ball_state ball_owning_team team_id \\\n", "0 1 0.878 2.788504 alive 909 909 \n", "1 1 4.288 6.764772 alive 909 909 \n", "2 1 12.163 14.093230 alive 914 914 \n", "3 1 16.420 18.242108 alive 914 914 \n", "4 1 20.025 21.071484 alive 914 914 \n", "\n", " player_id coordinates_x coordinates_y end_coordinates_x \\\n", "0 11086 59.95 39.95 32.45 \n", "1 8963 36.15 30.35 70.65 \n", "2 8286 43.05 0.05 15.75 \n", "3 6954 3.25 12.65 7.85 \n", "4 7036 9.05 38.85 19.65 \n", "\n", " end_coordinates_y receiver_player_id set_piece_type body_part_type \\\n", "0 28.75 8963 KICK_OFF RIGHT_FOOT \n", "1 75.75 8541 NaN LEFT_FOOT \n", "2 7.45 6954 THROW_IN NaN \n", "3 36.15 7036 NaN RIGHT_FOOT \n", "4 47.85 7173 NaN RIGHT_FOOT \n", "\n", " pass_type \n", "0 NaN \n", "1 LONG_BALL \n", "2 NaN \n", "3 NaN \n", "4 NaN " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from kloppy import statsbomb\n", "\n", "dataset = statsbomb.load(\n", " event_data=\"../../kloppy/tests/files/statsbomb_3788741_event.json\",\n", " lineup_data=\"../../kloppy/tests/files/statsbomb_3788741_lineup.json\",\n", " \n", " # 360 file is optional\n", " three_sixty_data=\"../../kloppy/tests/files/statsbomb_3788741_360.json\",\n", " \n", " # Optional arguments\n", " coordinates=\"statsbomb\",\n", " event_types=[\"pass\", \"shot\"]\n", ")\n", "\n", "dataset.to_df().head()" ] }, { "cell_type": "markdown", "id": "8fa1e495", "metadata": {}, "source": [ "## Load remote open data files\n", "\n", "You can also directly read files from urls (http or https) by passing a url instead of a local path." ] }, { "cell_type": "code", "execution_count": 2, "id": "f3b9119e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>event_id</th>\n", " <th>event_type</th>\n", " <th>result</th>\n", " <th>success</th>\n", " <th>period_id</th>\n", " <th>timestamp</th>\n", " <th>end_timestamp</th>\n", " <th>ball_state</th>\n", " <th>ball_owning_team</th>\n", " <th>team_id</th>\n", " <th>player_id</th>\n", " <th>coordinates_x</th>\n", " <th>coordinates_y</th>\n", " <th>end_coordinates_x</th>\n", " <th>end_coordinates_y</th>\n", " <th>receiver_player_id</th>\n", " <th>set_piece_type</th>\n", " <th>body_part_type</th>\n", " <th>pass_type</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>34208ade-2af4-45c3-970e-655937cad938</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>0.098</td>\n", " <td>2.007</td>\n", " <td>alive</td>\n", " <td>206</td>\n", " <td>206</td>\n", " <td>6581</td>\n", " <td>60.5</td>\n", " <td>40.5</td>\n", " <td>35.5</td>\n", " <td>25.5</td>\n", " <td>6855</td>\n", " <td>KICK_OFF</td>\n", " <td>LEFT_FOOT</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>d1cccb73-c7ef-4b02-8267-ebd7f149904b</td>\n", " <td>PASS</td>\n", " <td>INCOMPLETE</td>\n", " <td>False</td>\n", " <td>1</td>\n", " <td>3.497</td>\n", " <td>6.785</td>\n", " <td>alive</td>\n", " <td>206</td>\n", " <td>206</td>\n", " <td>6855</td>\n", " <td>35.5</td>\n", " <td>28.5</td>\n", " <td>85.5</td>\n", " <td>72.5</td>\n", " <td>None</td>\n", " <td>NaN</td>\n", " <td>RIGHT_FOOT</td>\n", " <td>LONG_BALL</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>f1cc47d6-4b19-45a6-beb9-33d67fc83f4b</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>6.785</td>\n", " <td>8.431</td>\n", " <td>alive</td>\n", " <td>217</td>\n", " <td>217</td>\n", " <td>5203</td>\n", " <td>34.5</td>\n", " <td>7.5</td>\n", " <td>34.5</td>\n", " <td>20.5</td>\n", " <td>5470</td>\n", " <td>NaN</td>\n", " <td>HEAD</td>\n", " <td>HEAD_PASS</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>f774571f-4b65-43a0-9bfc-6384948d1b82</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>8.431</td>\n", " <td>9.576</td>\n", " <td>alive</td>\n", " <td>217</td>\n", " <td>217</td>\n", " <td>5470</td>\n", " <td>35.5</td>\n", " <td>20.5</td>\n", " <td>35.5</td>\n", " <td>1.5</td>\n", " <td>5477</td>\n", " <td>NaN</td>\n", " <td>HEAD</td>\n", " <td>HEAD_PASS</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>46f0e871-3e72-4817-9a53-af27583ba6c1</td>\n", " <td>PASS</td>\n", " <td>COMPLETE</td>\n", " <td>True</td>\n", " <td>1</td>\n", " <td>10.433</td>\n", " <td>11.150</td>\n", " <td>alive</td>\n", " <td>217</td>\n", " <td>217</td>\n", " <td>5477</td>\n", " <td>33.5</td>\n", " <td>2.5</td>\n", " <td>25.5</td>\n", " <td>1.5</td>\n", " <td>5211</td>\n", " <td>NaN</td>\n", " <td>RIGHT_FOOT</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " event_id event_type result success \\\n", "0 34208ade-2af4-45c3-970e-655937cad938 PASS COMPLETE True \n", "1 d1cccb73-c7ef-4b02-8267-ebd7f149904b PASS INCOMPLETE False \n", "2 f1cc47d6-4b19-45a6-beb9-33d67fc83f4b PASS COMPLETE True \n", "3 f774571f-4b65-43a0-9bfc-6384948d1b82 PASS COMPLETE True \n", "4 46f0e871-3e72-4817-9a53-af27583ba6c1 PASS COMPLETE True \n", "\n", " period_id timestamp end_timestamp ball_state ball_owning_team team_id \\\n", "0 1 0.098 2.007 alive 206 206 \n", "1 1 3.497 6.785 alive 206 206 \n", "2 1 6.785 8.431 alive 217 217 \n", "3 1 8.431 9.576 alive 217 217 \n", "4 1 10.433 11.150 alive 217 217 \n", "\n", " player_id coordinates_x coordinates_y end_coordinates_x \\\n", "0 6581 60.5 40.5 35.5 \n", "1 6855 35.5 28.5 85.5 \n", "2 5203 34.5 7.5 34.5 \n", "3 5470 35.5 20.5 35.5 \n", "4 5477 33.5 2.5 25.5 \n", "\n", " end_coordinates_y receiver_player_id set_piece_type body_part_type \\\n", "0 25.5 6855 KICK_OFF LEFT_FOOT \n", "1 72.5 None NaN RIGHT_FOOT \n", "2 20.5 5470 NaN HEAD \n", "3 1.5 5477 NaN HEAD \n", "4 1.5 5211 NaN RIGHT_FOOT \n", "\n", " pass_type \n", "0 NaN \n", "1 LONG_BALL \n", "2 HEAD_PASS \n", "3 HEAD_PASS \n", "4 NaN " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from kloppy import statsbomb\n", "\n", "dataset = statsbomb.load(\n", " event_data=\"https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/15946.json\",\n", " lineup_data=\"https://raw.githubusercontent.com/statsbomb/open-data/master/data/lineups/15946.json\",\n", " \n", " # Optional arguments\n", " coordinates=\"statsbomb\",\n", " event_types=[\"pass\", \"shot\"]\n", ")\n", "\n", "dataset.to_df().head()" ] }, { "cell_type": "markdown", "id": "6ee0dc50", "metadata": {}, "source": [ "## Load remote files\n", "Kloppy supports remote files through `fsspec` FileSystem under the hood. This allows you to work with files in AWS S3, Google Cloud, Azure Blob, HDFS, FTP, and SFTP without extra tools.\n", "For example you can pass:\n", "- Individual s3 file paths: (e.g `event_data=s3://.../statsbomb_3788741_event.json`)\n", "\n", "Note: Kloppy might throw an the first time to help you identify missing cloud specific dependencies like `s3fs`. " ] }, { "cell_type": "code", "execution_count": null, "id": "86f9a65b", "metadata": {}, "outputs": [], "source": [ "from kloppy import statsbomb\n", "\n", "dataset = statsbomb.load(\n", " event_data=\"s3://.../statsbomb_3788741_event.json\",\n", " lineup_data=\"s3://.../statsbomb_3788741_lineup.json\",\n", " \n", " # 360 file is optional\n", " three_sixty_data=\"s3://.../statsbomb_3788741_360.json\",\n", " \n", " # Optional arguments\n", " coordinates=\"statsbomb\",\n", " event_types=[\"pass\", \"shot\"]\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }