Pattern matching in event data¶
This guide allows you to perofrm pattern matching, but instead of matching characters, we match sequences of event plays.
Setup¶
Start by loading some event data using the Kloppy module. For the sake of this demonstration, we will use Statsbomb Open Event Data.
In [1]:
Copied!
from kloppy import statsbomb, event_pattern_matching as pm
from datetime import timedelta
from collections import Counter
import polars as pl
dataset = statsbomb.load_open_data(
match_id=15946,
# Optional arguments
coordinates="statsbomb",
)
from kloppy import statsbomb, event_pattern_matching as pm
from datetime import timedelta
from collections import Counter
import polars as pl
dataset = statsbomb.load_open_data(
match_id=15946,
# Optional arguments
coordinates="statsbomb",
)
/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/kloppy/kloppy/_providers/statsbomb.py:83: UserWarning: You are about to use StatsBomb public data. By using this data, you are agreeing to the user agreement. The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf warnings.warn(
Breakdown of the code¶
- Search for a pass. When we found one, lets capture it for later usage.
- We want to find ball losses. This means the team changes. In this case we want to match 1 or more passes from team B ("not same as team A"). The
slice(1, None)
means "1 or more" - We create a group of events. The groups makes it possible to:
- match all of its children, or none and
- capture it.
The pattern within the group matches when there is a successful pass of team A within 10 seconds after "last_pass_of_team_a" and it's followed by a successful pass OR a shot. The slice(0, 1)
means the subpattern should match zero or once times. When the subpattern is not found there is no capture.
In [2]:
Copied!
recover_ball_within_10_seconds = (
# 1
pm.match_pass(capture="last_pass_of_team_a")
+
# 2
pm.match_pass(team=pm.not_same_as("last_pass_of_team_a.team")) * slice(1, None)
+
# 3
pm.group(
pm.match_pass(
success=True,
team=pm.same_as("last_pass_of_team_a.team"),
timestamp=pm.function(
lambda timestamp, last_pass_of_team_a_timestamp: timestamp
- last_pass_of_team_a_timestamp
< timedelta(seconds=15)
),
capture="recover",
)
+ (
# resulted in possession after 5 seconds
pm.group(
pm.match_pass(
success=True,
team=pm.same_as("recover.team"),
timestamp=pm.function(
lambda timestamp, recover_timestamp, **kwargs: timestamp
- recover_timestamp
< timedelta(seconds=5)
),
)
* slice(None, None)
+ pm.match_pass(
success=True,
team=pm.same_as("recover.team"),
timestamp=pm.function(
lambda timestamp, recover_timestamp, **kwargs: timestamp
- recover_timestamp
> timedelta(seconds=5)
),
)
)
| pm.group(
pm.match_pass(success=True, team=pm.same_as("recover.team"))
* slice(None, None)
+ pm.match_shot(team=pm.same_as("recover.team"))
)
),
capture="success",
)
* slice(0, 1)
)
recover_ball_within_10_seconds = (
# 1
pm.match_pass(capture="last_pass_of_team_a")
+
# 2
pm.match_pass(team=pm.not_same_as("last_pass_of_team_a.team")) * slice(1, None)
+
# 3
pm.group(
pm.match_pass(
success=True,
team=pm.same_as("last_pass_of_team_a.team"),
timestamp=pm.function(
lambda timestamp, last_pass_of_team_a_timestamp: timestamp
- last_pass_of_team_a_timestamp
< timedelta(seconds=15)
),
capture="recover",
)
+ (
# resulted in possession after 5 seconds
pm.group(
pm.match_pass(
success=True,
team=pm.same_as("recover.team"),
timestamp=pm.function(
lambda timestamp, recover_timestamp, **kwargs: timestamp
- recover_timestamp
< timedelta(seconds=5)
),
)
* slice(None, None)
+ pm.match_pass(
success=True,
team=pm.same_as("recover.team"),
timestamp=pm.function(
lambda timestamp, recover_timestamp, **kwargs: timestamp
- recover_timestamp
> timedelta(seconds=5)
),
)
)
| pm.group(
pm.match_pass(success=True, team=pm.same_as("recover.team"))
* slice(None, None)
+ pm.match_shot(team=pm.same_as("recover.team"))
)
),
capture="success",
)
* slice(0, 1)
)
Update the counter¶
Initialzie a counter to keep track of the total number of recoveries and the number of successful recoveries for each team.
In [3]:
Copied!
counter = Counter()
matches = pm.search(dataset, pattern=recover_ball_within_10_seconds)
for match in matches:
team = match.captures["last_pass_of_team_a"].team
success = "success" in match.captures
counter.update(
{
f"{team.ground}_total": 1,
f"{team.ground}_success": 1 if success else 0,
}
)
counter
counter = Counter()
matches = pm.search(dataset, pattern=recover_ball_within_10_seconds)
for match in matches:
team = match.captures["last_pass_of_team_a"].team
success = "success" in match.captures
counter.update(
{
f"{team.ground}_total": 1,
f"{team.ground}_success": 1 if success else 0,
}
)
counter
Out[3]:
Counter({'home_total': 8, 'away_total': 8, 'home_success': 0, 'away_success': 0})