# Code data

Apart from event- and tracking data analytics often use code data. This type of data can be collected by hand using tools like SportsCode. 

Kloppy allows easy read AND write functionallity for the codes in SportsCode XML format. 

## Reading XML file

In [1]:
from kloppy import sportscode

In [2]:
with open("file.xml", "w") as fp:
    fp.write("""<?xml version="1.0"?>
<file>
    <ALL_INSTANCES>
        <instance>
            <ID>P1</ID>
            <start>3.6</start>
            <end>9.7</end>
            <code>PASS</code>
            <label>
                <group>Team</group>
                <text>Henkie</text>
            </label>
            <label>
                <group>Packing.Value</group>
                <text>1</text>
            </label>
            <label>
                <group>Receiver</group>
                <text>Klaas N&#xF8;me</text>
            </label>
        </instance>
        <instance>
            <ID>P2</ID>
            <start>68.3</start>
            <end>74.5</end>
            <code>PASS</code>
            <label>
                <group>Team</group>
                <text>Henkie</text>
            </label>
            <label>
                <group>Packing.Value</group>
                <text>3</text>
            </label>
            <label>
                <group>Receiver</group>
                <text>Piet</text>
            </label>
        </instance>
        <instance>
            <ID>P3</ID>
            <start>103.6</start>
            <end>109.6</end>
            <code>SHOT</code>
            <label>
                <group>Team</group>
                <text>Henkie</text>
            </label>
            <label>
                <group>Expected.Goal.Value</group>
                <text>0.13</text>
            </label>
        </instance>
    </ALL_INSTANCES>
</file>""")
    
code_dataset = sportscode.load("file.xml")


In [3]:
code_dataset.to_df()

Unnamed: 0,code_id,period_id,timestamp,end_timestamp,code,Team,Packing.Value,Receiver,Expected.Goal.Value
0,P1,1,3.6,9.7,PASS,Henkie,1.0,Klaas Nøme,
1,P2,1,68.3,74.5,PASS,Henkie,3.0,Piet,
2,P3,1,103.6,109.6,SHOT,Henkie,,,0.13


The code dataset also allows filtering the codes

In [4]:
passes = code_dataset.filter(lambda code: code.code == 'PASS')
passes.to_df()

Unnamed: 0,code_id,period_id,timestamp,end_timestamp,code,Team,Packing.Value,Receiver
0,P1,1,3.6,9.7,PASS,Henkie,1,Klaas Nøme
1,P2,1,68.3,74.5,PASS,Henkie,3,Piet


## Writing XML file

In [5]:
sportscode.save(passes, "file.xml")

with open("file.xml", "r") as fp:
    print(fp.read())

<?xml version='1.0' encoding='utf-8'?>
<file>
  <ALL_INSTANCES>
    <instance>
      <ID>P1</ID>
      <start>3.6</start>
      <end>9.7</end>
      <code>PASS</code>
      <label>
        <group>Team</group>
        <text>Henkie</text>
      </label>
      <label>
        <group>Packing.Value</group>
        <text>1</text>
      </label>
      <label>
        <group>Receiver</group>
        <text>Klaas Nøme</text>
      </label>
    </instance>
    <instance>
      <ID>P2</ID>
      <start>68.3</start>
      <end>74.5</end>
      <code>PASS</code>
      <label>
        <group>Team</group>
        <text>Henkie</text>
      </label>
      <label>
        <group>Packing.Value</group>
        <text>3</text>
      </label>
      <label>
        <group>Receiver</group>
        <text>Piet</text>
      </label>
    </instance>
  </ALL_INSTANCES>
</file>



## Converting event dataset into XML dataset

In [6]:
from kloppy import statsbomb

dataset = statsbomb.load_open_data()


You are about to use StatsBomb public data.
By using this data, you are agreeing to the user agreement. 
The user agreement can be found here: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf



In [7]:
from kloppy.domain import Code, CodeDataset, EventType

dataset_shots = dataset.filter(
    lambda event: event.event_type == EventType.SHOT
)

code_dataset = (
    CodeDataset
    .from_dataset(
        dataset_shots,
        lambda event: Code(
            code_id=None,  # make it auto increment on write
            code=event.event_name,
            period=event.period,
            timestamp=max(0, event.timestamp - 7),
            end_timestamp=event.timestamp + 5,
            labels={
                'Player': str(event.player),
                'Team': str(event.team)
            },
            
            # In the future next two won't be needed anymore
            ball_owning_team=None,
            ball_state=None
        )
    )
)

In [8]:
code_dataset.to_df()

Unnamed: 0,code_id,period_id,timestamp,end_timestamp,code,Player,Team
0,,1,142.094,154.094,shot,Lionel Andrés Messi Cuccittini,Barcelona
1,,1,332.239,344.239,shot,Jordi Alba Ramos,Barcelona
2,,1,921.625,933.625,shot,Lionel Andrés Messi Cuccittini,Barcelona
3,,1,972.616,984.616,shot,Rubén Sobrino Pozuelo,Deportivo Alavés
4,,1,1088.914,1100.914,shot,Luis Alberto Suárez Díaz,Barcelona
5,,1,1835.287,1847.287,shot,Ousmane Dembélé,Barcelona
6,,1,2097.861,2109.861,shot,Ivan Rakitić,Barcelona
7,,1,2241.168,2253.168,shot,Lionel Andrés Messi Cuccittini,Barcelona
8,,1,2243.989,2255.989,shot,Gerard Piqué Bernabéu,Barcelona
9,,1,2301.083,2313.083,shot,Ousmane Dembélé,Barcelona


In [9]:
sportscode.save(code_dataset, "file.xml")

with open("file.xml", "r") as fp:
    print(fp.read())

<?xml version='1.0' encoding='utf-8'?>
<file>
  <ALL_INSTANCES>
    <instance>
      <ID>1</ID>
      <start>142.094</start>
      <end>154.094</end>
      <code>shot</code>
      <label>
        <group>Player</group>
        <text>Lionel Andrés Messi Cuccittini</text>
      </label>
      <label>
        <group>Team</group>
        <text>Barcelona</text>
      </label>
    </instance>
    <instance>
      <ID>2</ID>
      <start>332.239</start>
      <end>344.239</end>
      <code>shot</code>
      <label>
        <group>Player</group>
        <text>Jordi Alba Ramos</text>
      </label>
      <label>
        <group>Team</group>
        <text>Barcelona</text>
      </label>
    </instance>
    <instance>
      <ID>3</ID>
      <start>921.625</start>
      <end>933.625</end>
      <code>shot</code>
      <label>
        <group>Player</group>
        <text>Lionel Andrés Messi Cuccittini</text>
      </label>
      <label>
        <group>Team</group>
        <text>Barcelona</text>
      

In [10]:
import os
os.unlink("file.xml")

In [14]:
# Chain filter and map operators
new_dataset = (
    code_dataset
    .filter(lambda record: record.labels['Team'] == 'Barcelona')
    .map(lambda record: record.replace(code='Schot Barcelona'))
)
new_dataset.to_df()

Unnamed: 0,code_id,period_id,timestamp,end_timestamp,code,Player,Team
0,,1,142.094,154.094,Schot Barcelona,Lionel Andrés Messi Cuccittini,Barcelona
1,,1,332.239,344.239,Schot Barcelona,Jordi Alba Ramos,Barcelona
2,,1,921.625,933.625,Schot Barcelona,Lionel Andrés Messi Cuccittini,Barcelona
3,,1,1088.914,1100.914,Schot Barcelona,Luis Alberto Suárez Díaz,Barcelona
4,,1,1835.287,1847.287,Schot Barcelona,Ousmane Dembélé,Barcelona
5,,1,2097.861,2109.861,Schot Barcelona,Ivan Rakitić,Barcelona
6,,1,2241.168,2253.168,Schot Barcelona,Lionel Andrés Messi Cuccittini,Barcelona
7,,1,2243.989,2255.989,Schot Barcelona,Gerard Piqué Bernabéu,Barcelona
8,,1,2301.083,2313.083,Schot Barcelona,Ousmane Dembélé,Barcelona
9,,1,2427.592,2439.592,Schot Barcelona,Luis Alberto Suárez Díaz,Barcelona
