Update README.md

7990cde verified 5 months ago

5.59 kB

	---
	language:
	- en
	- fr
	- it
	- pt
	tags:
	- formal or informal classification
	licenses:
	- cc-by-nc-sa
	license: cc-by-nc-sa-4.0
	---

	XLMRoberta-based classifier trained on XFORMAL.


	all

	\| \| precision \| recall \| f1-score \| support \|
	\|--------------\|-----------\|----------\|----------\|---------\|
	\| 0 \| 0.744912 \| 0.927790 \| 0.826354 \| 108019 \|
	\| 1 \| 0.889088 \| 0.645630 \| 0.748048 \| 96845 \|
	\| accuracy \| \| \| 0.794405 \| 204864 \|
	\| macro avg \| 0.817000 \| 0.786710 \| 0.787201 \| 204864 \|
	\| weighted avg \| 0.813068 \| 0.794405 \| 0.789337 \| 204864 \|


	en

	\| \| precision \| recall \| f1-score \| support \|
	\|--------------\|-----------\|----------\|----------\|---------\|
	\| 0 \| 0.800053 \| 0.962981 \| 0.873988 \| 22151 \|
	\| 1 \| 0.945106 \| 0.725899 \| 0.821124 \| 19449 \|
	\| accuracy \| \| \| 0.852139 \| 41600 \|
	\| macro avg \| 0.872579 \| 0.844440 \| 0.847556 \| 41600 \|
	\| weighted avg \| 0.867869 \| 0.852139 \| 0.849273 \| 41600 \|

	fr

	\| \| precision \| recall \| f1-score \| support \|
	\|--------------\|-----------\|----------\|----------\|---------\|
	\| 0 \| 0.746709 \| 0.925738 \| 0.826641 \| 21505 \|
	\| 1 \| 0.887305 \| 0.650592 \| 0.750731 \| 19327 \|
	\| accuracy \| \| \| 0.795504 \| 40832 \|
	\| macro avg \| 0.817007 \| 0.788165 \| 0.788686 \| 40832 \|
	\| weighted avg \| 0.813257 \| 0.795504 \| 0.790711 \| 40832 \|

	it

	\| \| precision \| recall \| f1-score \| support \|
	\|--------------\|-----------\|----------\|----------\|---------\|
	\| 0 \| 0.721282 \| 0.914669 \| 0.806545 \| 21528 \|
	\| 1 \| 0.864887 \| 0.607135 \| 0.713445 \| 19368 \|
	\| accuracy \| \| \| 0.769024 \| 40896 \|
	\| macro avg \| 0.793084 \| 0.760902 \| 0.759995 \| 40896 \|
	\| weighted avg \| 0.789292 \| 0.769024 \| 0.762454 \| 40896 \|

	pt

	\| \| precision \| recall \| f1-score \| support \|
	\|--------------\|-----------\|----------\|----------\|---------\|
	\| 0 \| 0.717546 \| 0.908167 \| 0.801681 \| 21637 \|
	\| 1 \| 0.853628 \| 0.599700 \| 0.704481 \| 19323 \|
	\| accuracy \| \| \| 0.762646 \| 40960 \|
	\| macro avg \| 0.785587 \| 0.753933 \| 0.753081 \| 40960 \|
	\| weighted avg \| 0.781743 \| 0.762646 \| 0.755826 \| 40960 \|

	## How to use
	```python
	from transformers import XLMRobertaTokenizerFast, XLMRobertaForSequenceClassification

	# load tokenizer and model weights
	tokenizer = XLMRobertaTokenizerFast.from_pretrained('s-nlp/xlmr_formality_classifier')
	model = XLMRobertaForSequenceClassification.from_pretrained('s-nlp/xlmr_formality_classifier')

	id2formality = {0: "formal", 1: "informal"}
	texts = [
	"I like you. I love you",
	"Hey, what's up?",
	"Siema, co porabiasz?",
	"I feel deep regret and sadness about the situation in international politics.",
	]

	# prepare the input
	encoding = tokenizer(
	texts,
	add_special_tokens=True,
	return_token_type_ids=True,
	truncation=True,
	padding="max_length",
	return_tensors="pt",
	)

	# inference
	output = model(**encoding)

	formality_scores = [
	{id2formality[idx]: score for idx, score in enumerate(text_scores.tolist())}
	for text_scores in output.logits.softmax(dim=1)
	]
	formality_scores
	```

	```
	[{'formal': 0.993225634098053, 'informal': 0.006774314679205418},
	{'formal': 0.8807966113090515, 'informal': 0.1192033663392067},
	{'formal': 0.936184287071228, 'informal': 0.06381577253341675},
	{'formal': 0.9986615180969238, 'informal': 0.0013385231141000986}]
	```

	## Citation

	```
	@inproceedings{dementieva-etal-2023-detecting,
	title = "Detecting Text Formality: A Study of Text Classification Approaches",
	author = "Dementieva, Daryna and
	Babakov, Nikolay and
	Panchenko, Alexander",
	editor = "Mitkov, Ruslan and
	Angelova, Galia",
	booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
	month = sep,
	year = "2023",
	address = "Varna, Bulgaria",
	publisher = "INCOMA Ltd., Shoumen, Bulgaria",
	url = "https://aclanthology.org/2023.ranlp-1.31",
	pages = "274--284",
	abstract = "Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation{---}GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments {--} monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task, while Transformer-based classifiers are more stable to cross-lingual knowledge transfer.",
	}
	```


	## Licensing Information

	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png