{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pattern matching in event data\n", "This guide allows you to perofrm pattern matching, but instead of matching characters, we match sequences of event plays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "Start by loading some event data using the Kloppy module. For the sake of this demonstration, we will use Statsbomb Open Event Data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/kloppy/kloppy/_providers/statsbomb.py:83: UserWarning: \n", "\n", "You are about to use StatsBomb public data.\n", "By using this data, you are agreeing to the user agreement. \n", "The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf\n", "\n", " warnings.warn(\n" ] } ], "source": [ "from kloppy import statsbomb, event_pattern_matching as pm\n", "from datetime import timedelta\n", "from collections import Counter\n", "\n", "import polars as pl\n", "\n", "dataset = statsbomb.load_open_data(\n", " match_id=15946,\n", " # Optional arguments\n", " coordinates=\"statsbomb\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Breakdown of the code\n", "1. Search for a pass. When we found one, lets capture it for later usage.\n", "2. We want to find ball losses. This means the team changes. In this case we want to match 1 or more passes from team B (\"not same as team A\"). The `slice(1, None)` means \"1 or more\"\n", "3. We create a group of events. The groups makes it possible to: \n", " - match all of its children, or none and \n", " - capture it.\n", "\n", "The pattern within the group matches when there is a successful pass of team A within 10 seconds after \"last_pass_of_team_a\" and it's followed by a successful pass OR a shot. The `slice(0, 1)` means the subpattern should match zero or once times. When the subpattern is not found there is no capture." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "recover_ball_within_10_seconds = (\n", " # 1\n", " pm.match_pass(capture=\"last_pass_of_team_a\")\n", " +\n", " # 2\n", " pm.match_pass(team=pm.not_same_as(\"last_pass_of_team_a.team\")) * slice(1, None)\n", " +\n", " # 3\n", " pm.group(\n", " pm.match_pass(\n", " success=True,\n", " team=pm.same_as(\"last_pass_of_team_a.team\"),\n", " timestamp=pm.function(\n", " lambda timestamp, last_pass_of_team_a_timestamp: timestamp\n", " - last_pass_of_team_a_timestamp\n", " < timedelta(seconds=15)\n", " ),\n", " capture=\"recover\",\n", " )\n", " + (\n", " # resulted in possession after 5 seconds\n", " pm.group(\n", " pm.match_pass(\n", " success=True,\n", " team=pm.same_as(\"recover.team\"),\n", " timestamp=pm.function(\n", " lambda timestamp, recover_timestamp, **kwargs: timestamp\n", " - recover_timestamp\n", " < timedelta(seconds=5)\n", " ),\n", " )\n", " * slice(None, None)\n", " + pm.match_pass(\n", " success=True,\n", " team=pm.same_as(\"recover.team\"),\n", " timestamp=pm.function(\n", " lambda timestamp, recover_timestamp, **kwargs: timestamp\n", " - recover_timestamp\n", " > timedelta(seconds=5)\n", " ),\n", " )\n", " )\n", " | pm.group(\n", " pm.match_pass(success=True, team=pm.same_as(\"recover.team\"))\n", " * slice(None, None)\n", " + pm.match_shot(team=pm.same_as(\"recover.team\"))\n", " )\n", " ),\n", " capture=\"success\",\n", " )\n", " * slice(0, 1)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Update the counter\n", "Initialzie a counter to keep track of the total number of recoveries and the number of successful recoveries for each team." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'home_total': 8,\n", " 'away_total': 8,\n", " 'home_success': 0,\n", " 'away_success': 0})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counter = Counter()\n", "\n", "matches = pm.search(dataset, pattern=recover_ball_within_10_seconds)\n", "for match in matches:\n", " team = match.captures[\"last_pass_of_team_a\"].team\n", " success = \"success\" in match.captures\n", "\n", " counter.update(\n", " {\n", " f\"{team.ground}_total\": 1,\n", " f\"{team.ground}_success\": 1 if success else 0,\n", " }\n", " )\n", "\n", "counter" ] } ], "metadata": { "kernelspec": { "display_name": "/home/pieterr/Jupiter/Projects/kloppy", "language": "python", "name": "kloppy" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 4 }