{ "cells": [ { "cell_type": "markdown", "id": "a4a4a60f", "metadata": {}, "source": [ "# SecondSpectrum\n", "\n", "- [Load local files](#load-local-files)\n", "- [Load remote files](#load-remote-files)\n", "\n", "## Load local files" ] }, { "cell_type": "code", "execution_count": 1, "id": "efbb67de", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>period_id</th>\n", " <th>timestamp</th>\n", " <th>frame_id</th>\n", " <th>ball_state</th>\n", " <th>ball_owning_team_id</th>\n", " <th>ball_x</th>\n", " <th>ball_y</th>\n", " <th>ball_z</th>\n", " <th>20grw_x</th>\n", " <th>20grw_y</th>\n", " <th>...</th>\n", " <th>56zeu_d</th>\n", " <th>56zeu_s</th>\n", " <th>27cl51_x</th>\n", " <th>27cl51_y</th>\n", " <th>27cl51_d</th>\n", " <th>27cl51_s</th>\n", " <th>eh90mu_x</th>\n", " <th>eh90mu_y</th>\n", " <th>eh90mu_d</th>\n", " <th>eh90mu_s</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>160.00</td>\n", " <td>4000</td>\n", " <td>alive</td>\n", " <td>456</td>\n", " <td>48.434473</td>\n", " <td>-16.681311</td>\n", " <td>0.0</td>\n", " <td>46.299561</td>\n", " <td>-24.536171</td>\n", " <td>...</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>46.646914</td>\n", " <td>25.246787</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>5.033404</td>\n", " <td>-21.188707</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>681.72</td>\n", " <td>91600</td>\n", " <td>alive</td>\n", " <td>123</td>\n", " <td>23.364446</td>\n", " <td>-16.856017</td>\n", " <td>0.0</td>\n", " <td>8.861703</td>\n", " <td>-33.088368</td>\n", " <td>...</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>-48.850250</td>\n", " <td>-16.447842</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>15.112902</td>\n", " <td>12.965995</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>2 rows × 96 columns</p>\n", "</div>" ], "text/plain": [ " period_id timestamp frame_id ball_state ball_owning_team_id ball_x \\\n", "0 1 160.00 4000 alive 456 48.434473 \n", "1 2 681.72 91600 alive 123 23.364446 \n", "\n", " ball_y ball_z 20grw_x 20grw_y ... 56zeu_d 56zeu_s 27cl51_x \\\n", "0 -16.681311 0.0 46.299561 -24.536171 ... None None 46.646914 \n", "1 -16.856017 0.0 8.861703 -33.088368 ... None None -48.850250 \n", "\n", " 27cl51_y 27cl51_d 27cl51_s eh90mu_x eh90mu_y eh90mu_d eh90mu_s \n", "0 25.246787 None None 5.033404 -21.188707 None None \n", "1 -16.447842 None None 15.112902 12.965995 None None \n", "\n", "[2 rows x 96 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from kloppy import secondspectrum\n", "\n", "dataset = secondspectrum.load(\n", " meta_data=\"../../kloppy/tests/files/second_spectrum_fake_metadata.xml\",\n", " raw_data=\"../../kloppy/tests/files/second_spectrum_fake_data.jsonl\",\n", " \n", " # Optional arguments\n", " additional_meta_data=\"../../kloppy/tests/files/second_spectrum_fake_metadata.json\",\n", " sample_rate=1/25,\n", " limit=100,\n", " coordinates=\"secondspectrum\",\n", " only_alive=True\n", ")\n", "\n", "dataset.to_df().head()" ] }, { "cell_type": "markdown", "id": "897b41e9", "metadata": {}, "source": [ "## Load remote files\n", "\n", "Kloppy supports remote files through `fsspec` FileSystem under the hood. This allows you to work with files in AWS S3, Google Cloud, Azure Blob, HDFS, FTP, and SFTP without extra tools.\n", "For example you can pass:\n", "- Individual s3 file paths: (e.g `raw_data=s3://.../second_spectrum_fake_data.jsonl`)\n", "\n", "Note: Kloppy might throw an the first time to help you identify missing cloud specific dependencies like `s3fs`. " ] }, { "cell_type": "code", "execution_count": null, "id": "b4b1353d", "metadata": {}, "outputs": [], "source": [ "from kloppy import secondspectrum\n", "\n", "dataset = secondspectrum.load(\n", " meta_data=\"s3://.../second_spectrum_fake_metadata.xml\",\n", " raw_data=\"s3://.../second_spectrum_fake_data.jsonl\",\n", " \n", " # Optional arguments\n", " additional_meta_data=\"s3://.../second_spectrum_fake_metadata.json\",\n", " sample_rate=1/25,\n", " limit=100,\n", " coordinates=\"secondspectrum\",\n", " only_alive=True\n", ")\n", "\n", "dataset.to_df().head()" ] } ], "metadata": { "kernelspec": { "display_name": "kloppy-venv", "language": "python", "name": "kloppy-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }