---
base_model: INSAIT-Institute/BgGPT-7B-Instruct-v0.2
library_name: peft
license: apache-2.0
language:
- bg
tags:
- propaganda
---

# Model Card for identrics/BG_propaganda_detector


## Model Description

- **Developed by:** [`Identrics`](https://identrics.ai/)
- **Language:** Bulgarian
- **License:** apache-2.0
- **Finetuned from model:** [`INSAIT-Institute/BgGPT-7B-Instruct-v0.2`](https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.2)
- **Context window :** 8192 tokens

## Model Description

This model consists of a fine-tuned version of BgGPT-7B-Instruct-v0.2 for a propaganda detection task. It is effectively a binary classifier, determining wether propaganda is present in the output string.
This model was created by [`Identrics`](https://identrics.ai/), in the scope of the Wasper project.


## Uses

To be used as a binary classifier to identify if propaganda is present in a string containing a comment from a social media site

### Example

First install direct dependencies:
```
pip install transformers torch accelerate
```

Then the model can be downloaded and used for inference:
```py
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("identrics/EN_propaganda_detector", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("identrics/BG_propaganda_detector")

tokens = tokenizer("Газа евтин, американското ядрено гориво евтино, пълно с фотоволтаици а пък тока с 30% нагоре. Защо ?", return_tensors="pt")
output = model(**tokens)
print(output.logits)
```


## Training Details

The training datasets for the model consist of a balanced set totaling 734 Bulgarian examples that include both propaganda and non-propaganda content. These examples are collected from a variety of traditional media and social media sources, ensuring a diverse range of content. Aditionally, the training dataset is enriched with AI-generated samples. The total distribution of the training data is shown in the table below: 


![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/71vN4yLV9vyA5Cqc_WRRD.png)


The model was then tested on a smaller evaluation dataset, achieving an f1 score of 0.836. The evaluation dataset is distributed as such:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/66741cdd8123010b8f63f965/DunBsCJMZSFezNVB0Vo3a.png)


- PEFT 0.11.1