eventnet-ita / README.md
mrovera's picture
Updated paper ref
8d3755d verified
---
license: agpl-3.0
language:
- it
task_categories:
- token-classification
datasets:
- mrovera/eventnet-ita
tags:
- Frame Parsing
- Event Extraction
---
# EventNet-ITA
The model is a full-text frame parser for events in Italian and it has been trained on [EventNet-ITA](https://huggingface.co/datasets/mrovera/eventnet-ita).
The model can be used for _full-text_ Frame Parsing and Event Extraction.
Please refer to the [paper](https://aclanthology.org/2024.latechclfl-1.9) for a more detailed description.
## Model Details
### Model Description
In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence.
### Direct Use
Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s).
```
La B-ENTITY*BEING_LOCATED|B-THEME*CONQUERING
cittadina I-ENTITY*BEING_LOCATED|I-THEME*CONQUERING
, O
posta B-BEING_LOCATED
a B-RELATIVE_LOCATION*BEING_LOCATED
est I-RELATIVE_LOCATION*BEING_LOCATED
del I-RELATIVE_LOCATION*BEING_LOCATED
corso I-RELATIVE_LOCATION*BEING_LOCATED
d' I-RELATIVE_LOCATION*BEING_LOCATED
acqua I-RELATIVE_LOCATION*BEING_LOCATED
, O
venne O
conquistata B-CONQUERING
, O
ma O
il B-EXPLOSIVE*DETONATE_EXPLOSIVE
ponte I-EXPLOSIVE*DETONATE_EXPLOSIVE
sul I-EXPLOSIVE*DETONATE_EXPLOSIVE
fiume I-EXPLOSIVE*DETONATE_EXPLOSIVE
era O
già O
stato O
fatto B-DETONATE_EXPLOSIVE
saltare I-DETONATE_EXPLOSIVE
regolarmente O
dai B-AGENT*DETONATE_EXPLOSIVE
genieri I-AGENT*DETONATE_EXPLOSIVE
francesi I-AGENT*DETONATE_EXPLOSIVE
. O
```
## Training Details
The model has been trained using [MaChAmp](https://github.com/machamp-nlp/machamp), a Python tookit supporting a variety of NLP tasks, by fine-tuning [this Italian BERT pretrained model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased).
Training hyperparameters:
- Batch size: 64
- Learning rate: 1.5e-3
All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task.
### Training Data
Please refer to the [dataset repo](https://huggingface.co/datasets/mrovera/eventnet-ita).
### Model Re-training
In order to re-train the model, download the [dataset](https://huggingface.co/datasets/mrovera/eventnet-ita) and follow the instructions for training a [multiseq task](https://github.com/machamp-nlp/machamp/blob/master/docs/multiseq.md) in MaChAmp.
### Inference
EventNet-ITA's model can be used for Frame Parsing on new texts.
In order to do so, you have to follow a few simple steps.
1. Clone the github repo: `git clone https://github.com/machamp-nlp/machamp.git`
2. Download EventNet-ITA's model from this repo (450 MB) and move it into the `machamp` folder (where is up to you, by default MaChAmp saves trained models in the logs folder)
3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this:
```
This _
is _
the _
first _
sentence _
. _
This _
is _
the _
second _
one _
. _
```
4. Follow the instruction for predicting with [MaChAmp](https://github.com/machamp-nlp/machamp) (see section "Prediction") using a fine-tuned model.
## Evaluation
The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits.
**Token-based** (**_relaxed_**) performance:
| | P | R | F1 |
|----------------------------|--------|---------|---------|
|Frames | 0.904 | 0.914 | **0.907** |
|Frames (weighted) | 0.909 | 0.919 | 0.913 |
|Frame Elements | 0.841 | 0.724 | **0.761** |
|Frames Elements (weighted) | 0.850 | 0.779 | 0.804 |
**Span-based** (**_strict_**) performance:
| | P | R | F1 |
|----------------------------|--------|---------|--------|
|Frames | 0.906 | 0.899 | **0.901** |
|Frames (weighted) | 0.909 | 0.903 | 0.905 |
|Frame Elements | 0.829 | 0.666 | **0.724** |
|Frames Elements (weighted) | 0.853 | 0.711 | 0.768 |
### Citation Information
If you use EventNet-ITA, please cite the following paper:
```
@inproceedings{rovera-2024-eventnet,
title = "{E}vent{N}et-{ITA}: {I}talian Frame Parsing for Events",
author = "Rovera, Marco",
editor = "Bizzoni, Yuri and
Degaetano-Ortlieb, Stefania and
Kazantseva, Anna and
Szpakowicz, Stan",
booktitle = "Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)",
year = "2024",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.latechclfl-1.9",
pages = "77--90",
}
```