eventnet-ita / README.md

Updated paper ref

8d3755d verified 8 months ago

5.22 kB

	---
	license: agpl-3.0
	language:
	- it
	task_categories:
	- token-classification
	datasets:
	- mrovera/eventnet-ita
	tags:
	- Frame Parsing
	- Event Extraction
	---
	# EventNet-ITA

	The model is a full-text frame parser for events in Italian and it has been trained on [EventNet-ITA](https://huggingface.co/datasets/mrovera/eventnet-ita).
	The model can be used for _full-text_ Frame Parsing and Event Extraction.
	Please refer to the [paper](https://aclanthology.org/2024.latechclfl-1.9) for a more detailed description.


	## Model Details

	### Model Description

	In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence.


	### Direct Use

	Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s).
	```
	La B-ENTITYBEING_LOCATED\|B-THEMECONQUERING
	cittadina I-ENTITYBEING_LOCATED\|I-THEMECONQUERING
	, O
	posta B-BEING_LOCATED
	a B-RELATIVE_LOCATION*BEING_LOCATED
	est I-RELATIVE_LOCATION*BEING_LOCATED
	del I-RELATIVE_LOCATION*BEING_LOCATED
	corso I-RELATIVE_LOCATION*BEING_LOCATED
	d' I-RELATIVE_LOCATION*BEING_LOCATED
	acqua I-RELATIVE_LOCATION*BEING_LOCATED
	, O
	venne O
	conquistata B-CONQUERING
	, O
	ma O
	il B-EXPLOSIVE*DETONATE_EXPLOSIVE
	ponte I-EXPLOSIVE*DETONATE_EXPLOSIVE
	sul I-EXPLOSIVE*DETONATE_EXPLOSIVE
	fiume I-EXPLOSIVE*DETONATE_EXPLOSIVE
	era O
	già O
	stato O
	fatto B-DETONATE_EXPLOSIVE
	saltare I-DETONATE_EXPLOSIVE
	regolarmente O
	dai B-AGENT*DETONATE_EXPLOSIVE
	genieri I-AGENT*DETONATE_EXPLOSIVE
	francesi I-AGENT*DETONATE_EXPLOSIVE
	. O
	```


	## Training Details

	The model has been trained using [MaChAmp](https://github.com/machamp-nlp/machamp), a Python tookit supporting a variety of NLP tasks, by fine-tuning [this Italian BERT pretrained model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased).
	Training hyperparameters:
	- Batch size: 64
	- Learning rate: 1.5e-3

	All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task.



	### Training Data

	Please refer to the [dataset repo](https://huggingface.co/datasets/mrovera/eventnet-ita).


	### Model Re-training

	In order to re-train the model, download the [dataset](https://huggingface.co/datasets/mrovera/eventnet-ita) and follow the instructions for training a [multiseq task](https://github.com/machamp-nlp/machamp/blob/master/docs/multiseq.md) in MaChAmp.


	### Inference

	EventNet-ITA's model can be used for Frame Parsing on new texts.
	In order to do so, you have to follow a few simple steps.
	1. Clone the github repo: `git clone https://github.com/machamp-nlp/machamp.git`
	2. Download EventNet-ITA's model from this repo (450 MB) and move it into the `machamp` folder (where is up to you, by default MaChAmp saves trained models in the logs folder)
	3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this:
	```
	This _
	is _
	the _
	first _
	sentence _
	. _

	This _
	is _
	the _
	second _
	one _
	. _
	```
	4. Follow the instruction for predicting with [MaChAmp](https://github.com/machamp-nlp/machamp) (see section "Prediction") using a fine-tuned model.

	## Evaluation

	The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits.

	Token-based (_relaxed_) performance:
	\| \| P \| R \| F1 \|
	\|----------------------------\|--------\|---------\|---------\|
	\|Frames \| 0.904 \| 0.914 \| 0.907 \|
	\|Frames (weighted) \| 0.909 \| 0.919 \| 0.913 \|
	\|Frame Elements \| 0.841 \| 0.724 \| 0.761 \|
	\|Frames Elements (weighted) \| 0.850 \| 0.779 \| 0.804 \|


	Span-based (_strict_) performance:
	\| \| P \| R \| F1 \|
	\|----------------------------\|--------\|---------\|--------\|
	\|Frames \| 0.906 \| 0.899 \| 0.901 \|
	\|Frames (weighted) \| 0.909 \| 0.903 \| 0.905 \|
	\|Frame Elements \| 0.829 \| 0.666 \| 0.724 \|
	\|Frames Elements (weighted) \| 0.853 \| 0.711 \| 0.768 \|



	### Citation Information

	If you use EventNet-ITA, please cite the following paper:

	```
	@inproceedings{rovera-2024-eventnet,
	title = "{E}vent{N}et-{ITA}: {I}talian Frame Parsing for Events",
	author = "Rovera, Marco",
	editor = "Bizzoni, Yuri and
	Degaetano-Ortlieb, Stefania and
	Kazantseva, Anna and
	Szpakowicz, Stan",
	booktitle = "Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)",
	year = "2024",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2024.latechclfl-1.9",
	pages = "77--90",
	}
	```