emrulphy commited on
Commit
a81a8d8
1 Parent(s): 80eabcb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ ---
8
+
9
+ # Model Card: POLLCHECK/paligemma
10
+
11
+ ## Model Details
12
+
13
+ **Model Name:** POLLCHECK/paligemma
14
+
15
+ **Model Description:** This is a fine-tuned PaliGemma model for news classification e.g. "biased" or "unbiased". In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire
16
+ whereas 'unbiased' represents real news without the spread of misinformation, disinformation, and propaganda. The model can be used to identify potential bias in text and images, which is useful for applications in media analysis, content moderation, and research on bias in written communication.
17
+
18
+ **Base Model:** "google/paligemma-3b-pt-224"
19
+
20
+ **Fine-tuned Dataset:** The model was fine-tuned on a custom dataset annotated for bias detection, particularly, news articles related to politics.
21
+ Details of the dataset and the fine-tuning process are available upon request.
22
+
23
+ **Labels:**
24
+ -0 or `biased`
25
+ - 1 or `unbiased`
26
+
27
+ ## Intended Use
28
+
29
+ This model is intended for use in identifying biased in news article. Users can input a news article (image and the text) and receive a prediction indicating whether the text is biased or unbiased.
30
+
31
+ <!-- ### Class-wise Performance Metrics
32
+
33
+ | Class | Prec | Recall | F1 |
34
+ |-------|------|--------|-----|
35
+ | 0.75 | 0.59 | 0.53| -->
36
+
37
+ ## How to Use
38
+
39
+ To use this model for inference, follow the steps below:
40
+
41
+ ### Inference Code
42
+
43
+ ```python
44
+ import torch
45
+ from trl import setup_chat_format
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+
48
+ # Load the fine-tuned model and tokenizer
49
+ model_name = "POLLCHECK/Llama3-instruct-classifier" # Change this to the path of your fine-tuned model
50
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ model_name,
53
+ return_dict=True,
54
+ low_cpu_mem_usage=True,
55
+ #torch_dtype=torch.bfloat16,
56
+ device_map="auto",
57
+ trust_remote_code=True,
58
+ )
59
+
60
+
61
+ # make the model and tokenizer to chat formet
62
+ model, tokenizer = setup_chat_format(model, tokenizer)
63
+
64
+ ###
65
+
66
+ instruction=f"""You are a news classifer AI assitant. You are given with a news article that contains headline, body text and image.
67
+ Your task is to analyze the headline, body text and image, and classify the news as biased or unbiased.
68
+
69
+ In this particular task, the term 'biased' represents disinformation, propaganda, loaded language, negative associations, generalization, harm, hatred, satire\
70
+ whereas 'unbiased' represents real news without the spread of misinformation, disinformation, and propaganda.\
71
+
72
+ headlines: {headline}
73
+ news body text: {article}.
74
+ What would be the label for this news article considering features from both texts and image?
75
+ """}]
76
+
77
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
78
+
79
+ inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")
80
+ terminators = [tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|eot_id|>")]
81
+
82
+ outputs = model.generate(**inputs, max_new_tokens=30, eos_token_id=terminators, do_sample=True, temperature=0.7, top_p=0.9,)
83
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
84
+ print(response)
85
+
86
+ ```
87
+
88
+ <!-- ### Example Output
89
+
90
+ For the provided sample texts, the model might output:
91
+
92
+ ```
93
+ Text: Religious Extremists Threaten Our Way of Life.
94
+ Predicted label: biased (Biased Probability: 0.95, Unbiased Probability: 0.05)
95
+
96
+ Text: Public Health Officials are working.
97
+ Predicted label: unbiased (Biased Probability: 0.10, Unbiased Probability: 0.90)
98
+
99
+ Text: The new healthcare policy aims to provide affordable healthcare to all citizens, with a focus on preventive care.
100
+ Predicted label: unbiased (Biased Probability: 0.20, Unbiased Probability: 0.80)
101
+
102
+ Text: Environmental activists argue that the government's refusal to sign the climate agreement is a clear indication of its disregard for the environment.
103
+ Predicted label: biased (Biased Probability: 0.70, Unbiased Probability: 0.30)
104
+ ``` -->
105
+
106
+ <!-- Check inference with these paths:
107
+
108
+ - Sample Data: [News_Bias_Samples.csv](https://huggingface.co/POLLCHECK/BERT-classifier/blob/main/News_Bias_Samples.csv)
109
+ - Inference Script: [inference-bert.py](https://huggingface.co/POLLCHECK/BERT-classifier/blob/main/inference-bert.py) -->
110
+
111
+
112
+ ## Limitations and Bias
113
+
114
+ - **Dataset Bias:** The model's performance is highly dependent on the quality and diversity of the fine-tuning dataset. Biases present in the dataset will affect the model's predictions.
115
+ - **Context:** The model may not perform well on texts that are out of the distribution of the training data or on texts that require nuanced understanding of context.
116
+
117
+ ## Ethical Considerations
118
+
119
+ - **Fairness:** Ensure that the model is used in a fair and unbiased manner. Regularly evaluate the model's performance and address any biases that may arise.
120
+ - **Transparency:** Be transparent about the model's limitations and the potential for false positives and false negatives.
121
+ - **Accountability:** Users are responsible for the decisions made based on the model's predictions and should consider multiple sources of information when making important decisions.
122
+
123
+ <!-- ## References
124
+ @misc{pollcheck2024,
125
+ author = {Emrul Hasan, Shaina Raza, Veronica Chatrath},
126
+ title = {POLLCHECK/Pollcheck-llama3-news-classifier},
127
+ year = {2024},
128
+ publisher = {Hugging Face},
129
+ journal = {Hugging Face Model Hub},
130
+ howpublished = {\url{https://huggingface.co/emrulphy/POLLCHECK/Pollcheck-llama3-news-classifier}}
131
+ } -->
132
+ ## Contact Information
133
+
134
+ For questions, comments, or suggestions, please contact Emrul Hasan at [email protected].