Sami92
/

XLM-R-Large-Sensationalism-Classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

XLM-R-Large-Sensationalism-Classifier / README.md

Sami92's picture

Create README.md

96a718f verified about 1 month ago

|

history blame contribute delete

No virus

3.13 kB

	---
	license: cc-by-4.0
	library_name: transformers
	language:
	- de
	pipeline_tag: text-classification
	base_model: FacebookAI/xlm-roberta-large
	---

	# Model Card for Model ID

	Fine-tuned [XLM-R Large](https://huggingface.co/FacebookAI/xlm-roberta-large) for task of classifying sentences as sensationalistic or not. The taxonomy for sensationalistic claims follows Ashraf et al. 2024 and was trained on their annotated Twitter data.


	## Model Details


	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]


	## How to Get Started with the Model

	```python
	from transformers import pipeline

	texts = [
	'Afghanistan - Warum die Taliban Frauenrechte immer mehr einschränken\nhttps://t.co/rhwOdNoJUx',
	'#Münster #G7 oder "Ab jetzt außen rumfahren". https://t.co/Goj5vtrnst',
	'Interessantes Trio.\nDie eine hat eine Wahl vergeigt, die andere kungelt mit Putin und die Dritte hat die Hilfe nach der Flutkatastrophe nicht auf die Reihe bekommen. \nMehr Frauen an die Macht!',
	'Wie kann man sich #AnneWill betrachten ohne das übertragende Gerät zu zerschmettern. Eben 20 sec. dem #FDP Watschengesicht beim Quaken zugehört. Du lieber Himmel, wie weltfremd geht´s denn noch.'
	]
	checkpoint = "Sami92/XLM-R-Large-Sensationalism-Classifier"
	tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
	sensational_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
	sensational_classifier(texts)
	```

	## Training Details

	### Training Data




	#### Training Hyperparameters
	- Epochs: 10
	- Batch size: 16
	- learning_rate: 2e-5
	- weight_decay: 0.01
	- fp16: True

	## Evaluation


	#### Testing Data

	Evaluation was performed on the test split (30%) from Ashraf et al. 2024.


	### Results
	\| \| Precision \| Recall \| F1-Score \| Support \|
	\|--------------------\|-----------\|--------\|----------\|---------\|
	\| Non-Sensational \| 0.89 \| 0.92 \| 0.91 \| 1800 \|
	\| Sensational \| 0.75 \| 0.67 \| 0.71 \| 617 \|
	\| Accuracy \| \| \| 0.86 \| 2417 \|
	\| Macro Avg \| 0.82 \| 0.80 \| 0.81 \| 2417 \|
	\| Weighted Avg \| 0.86 \| 0.86 \| 0.86 \| 2417 \|



	BibTeX:

	```bibtex

	@inproceedings{ashraf_defakts_2024,
	address = {Torino, Italia},
	title = {{DeFaktS}: {A} {German} {Dataset} for {Fine}-{Grained} {Disinformation} {Detection} through {Social} {Media} {Framing}},
	shorttitle = {{DeFaktS}},
	url = {https://aclanthology.org/2024.lrec-main.409},
	booktitle = {Proceedings of the 2024 {Joint} {International} {Conference} on {Computational} {Linguistics}, {Language} {Resources} and {Evaluation} ({LREC}-{COLING} 2024)},
	publisher = {ELRA and ICCL},
	author = {Ashraf, Shaina and Bezzaoui, Isabel and Andone, Ionut and Markowetz, Alexander and Fegert, Jonas and Flek, Lucie},
	editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
	year = {2024},
	}
	```