File size: 5,827 Bytes
f3ae8a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58362d9
 
 
f3ae8a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2429a3
 
 
 
 
 
 
 
 
f3ae8a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce20321
 
 
 
 
 
 
 
 
f3ae8a0
 
ce20321
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: mit
language:
- en
metrics:
- accuracy
---

# Model Card: POLLCHECK/Pollcheck-llama3-news-classifier

## Model Details

**Model Name:** POLLCHECK/Pollcheck-llama3-news-classifier

**Model Description:** This is a fine-tuned llama3 model for news classification e.g. "biased" or "unbiased". In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire 
whereas 'unbiased' represents real news without the spread of misinformation, disinformation, and propaganda. The model can be used to identify potential bias in text, which is useful for applications in media analysis, content moderation, and research on bias in written communication.

**Base Model:** "meta-llama/Meta-Llama-3-8B-Instruct"

**Fine-tuned Dataset:** The model was fine-tuned on a custom dataset annotated for bias detection, particularly, news articles related to politics.
Details of the dataset and the fine-tuning process are available upon request.

**Labels:**
-0 or  `biased` (fake news)
- 1 or `unbiased` (real news)

## Intended Use

This model is intended for use in identifying biased in text. Users can input a piece of text and receive a prediction indicating whether the text is biased or unbiased. 

### Class-wise Performance Metrics

| Class | Prec | Recall | F1  |
|-------|------|--------|-----|
        | 0.75 | 0.59   | 0.53|

## How to Use

To use this model for inference, follow the steps below:

### Inference Code

```python
import torch
from trl import setup_chat_format
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the fine-tuned model and tokenizer
model_name = "POLLCHECK/Pollcheck-llama3-news-classifier"  # Change this to the path of your fine-tuned model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    return_dict=True,
    low_cpu_mem_usage=True,
    #torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)


# make the model and tokenizer to chat formet
model, tokenizer = setup_chat_format(model, tokenizer)

###

instruction=f"""You are a news classifier AI assistant. You are given with the headline and news article body.
Your task is to read the headline and news articles, and classify the articles as biased on unbiased. Also provide the confidence score for your labels.In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire
                whereas 'unbiased' represents real news without spread of misinformation, disinformation, and propaganda."""
headline="<Headline of the new article>"
article="<Article text body>"                
messages = [
  {"role": "user", "content": f"""{instruction}\nHeadline: {headline}\narticle: {article}\
  Return you answers in the following format.
  1. Labels: [biased/unbiased]
  2. Confidence: """}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")
terminators = [tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|eot_id|>")]

outputs = model.generate(**inputs, max_new_tokens=30, eos_token_id=terminators, do_sample=True, temperature=0.7, top_p=0.9,)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

```

<!-- ### Example Output

For the provided sample texts, the model might output:

```
Text: Religious Extremists Threaten Our Way of Life.
Predicted label: biased (Biased Probability: 0.95, Unbiased Probability: 0.05)

Text: Public Health Officials are working.
Predicted label: unbiased (Biased Probability: 0.10, Unbiased Probability: 0.90)

Text: The new healthcare policy aims to provide affordable healthcare to all citizens, with a focus on preventive care.
Predicted label: unbiased (Biased Probability: 0.20, Unbiased Probability: 0.80)

Text: Environmental activists argue that the government's refusal to sign the climate agreement is a clear indication of its disregard for the environment.
Predicted label: biased (Biased Probability: 0.70, Unbiased Probability: 0.30)
``` -->

<!-- Check inference with these paths:

- Sample Data: [News_Bias_Samples.csv](https://huggingface.co/POLLCHECK/BERT-classifier/blob/main/News_Bias_Samples.csv)
- Inference Script: [inference-bert.py](https://huggingface.co/POLLCHECK/BERT-classifier/blob/main/inference-bert.py) -->


## Limitations and Bias

- **Dataset Bias:** The model's performance is highly dependent on the quality and diversity of the fine-tuning dataset. Biases present in the dataset will affect the model's predictions.
- **Context:** The model may not perform well on texts that are out of the distribution of the training data or on texts that require nuanced understanding of context.

## Ethical Considerations

- **Fairness:** Ensure that the model is used in a fair and unbiased manner. Regularly evaluate the model's performance and address any biases that may arise.
- **Transparency:** Be transparent about the model's limitations and the potential for false positives and false negatives.
- **Accountability:** Users are responsible for the decisions made based on the model's predictions and should consider multiple sources of information when making important decisions.

## References
@misc{pollcheck2024,
author = {Emrul Hasan, Shaina Raza, Veronica Chatrath},
title = {Your Model Name},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/your-username/your-model-name}}
}
## Contact Information

For questions, comments, or suggestions, please contact Emrul Hasan at [email protected].