E-MIMIC
/

inclusively-classification

Text Classification

Model card Files Files and versions

inclusively-classification / README.md

grecosalvatore's picture

Update README.md

6875dbf verified about 1 month ago

|

history blame contribute delete

3.03 kB

	---
	license: cc-by-nc-sa-4.0
	---

	# Inclusively Classification Model

	This model is an Italian classification model fine-tuned from the [Italian BERT model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased) for the classification of inclusive language in Italian.

	It has been trained to detect three classes:
	- `inclusive`: the sentence is inclusive (e.g. "Il personale docente e non docente")
	- `not_inclusive`: the sentence is not inclusive (e.g. "I professori")
	- `not_pertinent`: the sentence is not pertinent to the task (e.g. "La scuola è chiusa")

	## Training data

	The model has been trained on a dataset containing:
	- 8580 training sentences
	- 1073 validation sentences
	- 1072 test sentences

	The data collection has been manually annotated by experts in the field of inclusive language (dataset is not publicly available yet).

	## Training procedure

	The model has been fine-tuned from the [Italian BERT model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased) using the following hyperparameters:
	- `max_length`: 128
	- `batch_size`: 128
	- `learning_rate`: 5e-5
	- `warmup_steps`: 500
	- `epochs`: 10 (best model is selected based on validation accuracy)
	- `optimizer`: AdamW

	## Evaluation results

	The model has been evaluated on the test set and obtained the following results:

	\| Model \| Accuracy \| Inclusive F1 \| Not inclusive F1 \| Not pertinent F1 \|
	\|-------\|----------\|--------------\|------------------\|------------------\|
	\| TF-IDF + MLP \| 0.68 \| 0.63 \| 0.69 \| 0.66 \|
	\| TF-IDF + SVM \| 0.61 \| 0.53 \| 0.60 \| 0.78 \|
	\| TF-IDF + GB \| 0.74 \| 0.74 \| 0.76 \| 0.72 \|
	\| multilingual \| 0.86 \| 0.88 \| 0.89 \| 0.83 \|
	\| This \| 0.89 \| 0.88 \| 0.92 \| 0.85 \|

	The model has been compared with a multilingual model trained on the same data and obtained better results.

	## Citation

	If you use this model, please make sure to cite the following papers:

	Main paper:

	```bibtex
	@article{10.1145/3729237,
	author = {Greco, Salvatore and La Quatra, Moreno and Cagliero, Luca and Cerquitelli, Tania},
	title = {Towards AI-Assisted Inclusive Language Writing in Italian Formal Communications},
	year = {2025},
	issue_date = {August 2025},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	volume = {16},
	number = {4},
	issn = {2157-6904},
	url = {https://doi.org/10.1145/3729237},
	doi = {10.1145/3729237},
	journal = {ACM Trans. Intell. Syst. Technol.},
	month = jun,
	articleno = {79},
	numpages = {24},
	keywords = {inclusive language, natural language processing, text classification, text generation}
	}
	```

	Demo paper:

	```bibtex
	@InProceedings{PKDD23_inclusively,
	author="La Quatra, Moreno
	and Greco, Salvatore
	and Cagliero, Luca
	and Cerquitelli, Tania",
	title="Inclusively: An AI-Based Assistant for Inclusive Writing",
	booktitle="Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track",
	year="2023",
	publisher="Springer Nature Switzerland",
	address="Cham",
	pages="361--365",
	isbn="978-3-031-43430-3",
	doi="10.1007/978-3-031-43430-3_31"
	}
	```