File size: 5,204 Bytes
a81a8d8
 
 
 
 
 
 
 
7517f71
a81a8d8
 
 
7517f71
a81a8d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2dfad50
 
 
a81a8d8
 
 
 
 
 
 
 
 
2dfad50
 
a81a8d8
2dfad50
 
a81a8d8
7517f71
2dfad50
 
 
 
 
 
 
 
 
fdb4d9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a81a8d8
2dfad50
4950583
a81a8d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
241c63c
 
a81a8d8
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: mit
language:
- en
metrics:
- accuracy
---

# Model Card: POLLCHECK/Paligemma-bias-classifier

## Model Details

**Model Name:** POLLCHECK/Paligemma-bias-classifier

**Model Description:** This is a fine-tuned PaliGemma model for news classification e.g. "biased" or "unbiased". In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire 
whereas 'unbiased' represents real news without the spread of misinformation, disinformation, and propaganda. The model can be used to identify potential bias in text and images, which is useful for applications in media analysis, content moderation, and research on bias in written communication.

**Base Model:** "google/paligemma-3b-pt-224"

**Fine-tuned Dataset:** The model was fine-tuned on a custom dataset annotated for bias detection, particularly, news articles related to politics.
Details of the dataset and the fine-tuning process are available upon request.

**Labels:**
-0 or  `biased` 
- 1 or `unbiased` 

## Intended Use

This model is intended for use in identifying biased in news article. Users can input a news article (image and the text) and receive a prediction indicating whether the text is biased or unbiased. 

<!-- ### Class-wise Performance Metrics

| Class | Prec | Recall | F1  |
|-------|------|--------|-----|
0 (unbiased)   | 0.74 | 0.67   | 0.70|
1 (biased)     | 0.69 | 0.76   | 0.73|
marco avg      | 0.72 | 0.72   | 0.72|-->

## How to Use

To use this model for inference, follow the steps below:

### Inference Code

```python
import torch
from PIL import Imageh
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

device = "cuda:0"
dtype = torch.bfloat16
# Load the fine-tuned model and tokenizer
model_id = "POLLCHECK/Paligemma-bias-classifier"  # path of the model
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device,
    revision="bfloat16",
).eval()
processor = AutoProcessor.from_pretrained(model_id)processor = AutoProcessor.from_pretrained(model_id)

### Model inputs and ouputs
img_path="Enter your image file driectory"
text_path="Enter article data path like csv file with image file names and corresponding news articles"
df = pd.read_csv(text_path)

for idx, row in test_df.iterrows():
  img=row['image_filename'] # image file names in the csv file

  headine=row['Headline'] # take the headline columns from the csv files.
  article=row['article'] # article names
  img_path=img_dir+f"/{img}" #full path of the image
  image=Image.open(img_path).convert('RGB')
  prompt=f"""You are a news classifer AI assitant. You are given with a news article that contains headline, body text and image.
              Your task is to analyze the headline, body text and image, and classify the news as biased or unbiased. 
              
              In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire\
              whereas 'unbiased' represents real news without the spread of misinformation, disinformation, and propaganda.\
              
              headlines: {headline}
              news body text: {article}. 
              What would be the label for this news article considering features from both texts and image?
              """
  
  model_inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
  input_len = model_inputs["input_ids"].shape[-1]
  generation = model.generate(**model_inputs, max_new_tokens=20, do_sample=False)
  generation = generation[0][input_len:]
  label = processor.decode(generation, skip_special_tokens=True)
  print(label)

###
An example files are given in the sample folder in the repository for your trial.

```


## Limitations and Bias

- **Dataset Bias:** The model's performance is highly dependent on the quality and diversity of the fine-tuning dataset. Biases present in the dataset will affect the model's predictions.
- **Context:** The model may not perform well on texts that are out of the distribution of the training data or on texts that require nuanced understanding of context.

## Ethical Considerations

- **Fairness:** Ensure that the model is used in a fair and unbiased manner. Regularly evaluate the model's performance and address any biases that may arise.
- **Transparency:** Be transparent about the model's limitations and the potential for false positives and false negatives.
- **Accountability:** Users are responsible for the decisions made based on the model's predictions and should consider multiple sources of information when making important decisions.

<!-- ## References
@misc{pollcheck2024,
author = {Emrul Hasan, Shaina Raza,Franklin, Veronica Chatrath},
title = {POLLCHECK/paligemma},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/emrulphy/POLLCHECK/Pollcheck-llama3-news-classifier}}
} -->
## Contact Information

For questions, comments, or suggestions, please contact Emrul Hasan at [email protected].