horychtom commited on
Commit
4dc28e0
·
verified ·
1 Parent(s): 198f56b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -133
README.md CHANGED
@@ -5,148 +5,42 @@ datasets:
5
  language:
6
  - en
7
  base_model:
8
- - FacebookAI/roberta-base
9
  pipeline_tag: text-classification
10
  ---
11
- Here’s a template for a `README.md` file that you can reuse for each of your models on Hugging Face. It is designed to provide a comprehensive overview of the model, its usage, links to relevant papers, datasets, and results:
12
 
13
- ---
14
-
15
- # Model Name
16
-
17
- **Model Name:** `Your Model Name`
18
- **Model Type:** Token-level / Sentence-level / Paragraph-level Classifier
19
- **Organization:** [Your Lab's Name or Organization](https://huggingface.co/your_org)
20
- **Model Version:** `v1.0.0`
21
- **Framework:** `PyTorch` or `TensorFlow`
22
- **License:** `MIT / Apache 2.0 / Other`
23
-
24
- ---
25
-
26
- ## Model Overview
27
-
28
- This model is a [token-level/sentence-level/paragraph-level] classifier that was trained for [specific task, e.g., sentiment analysis, named entity recognition, etc.]. The model is based on [model architecture, e.g., BERT, RoBERTa, etc.] and has been fine-tuned on [mention the dataset] for [number of epochs or other training details].
29
-
30
- It achieves state-of-the-art performance on [mention dataset or task] and is specifically designed for [specific domain or industry, if applicable].
31
-
32
- ---
33
-
34
- ## Training details
35
-
36
- - **Base Model:** [mention architecture, e.g., BERT-base, RoBERTa-large, etc.]
37
- - **Number of Parameters:** [number of parameters]
38
- - **Max Sequence Length:** [max input length, if relevant]
39
-
40
- ### Training Data
41
-
42
- The model was fine-tuned on the [name of dataset] dataset. This dataset consists of [short description of dataset, e.g., number of instances, labels, any important data characteristics].
43
-
44
- You can find the dataset [here](dataset_url).
45
-
46
- ---
47
-
48
- ## Evaluation Results
49
-
50
- The model was evaluated on [name of dataset] and achieved the following results:
51
-
52
- - **Accuracy:** [accuracy score]
53
- - **F1-Score:** [F1 score]
54
- - **Precision:** [precision score]
55
- - **Recall:** [recall score]
56
-
57
- For detailed evaluation results, see the corresponding paper or evaluation logs.
58
-
59
- ---
60
-
61
- ## Usage
62
-
63
- To use this model in your code, install the required libraries:
64
-
65
- ```bash
66
- pip install transformers
67
- ```
68
-
69
- Then, load the model as follows:
70
-
71
- ```python
72
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
73
-
74
- tokenizer = AutoTokenizer.from_pretrained("your_org/your_model")
75
- model = AutoModelForSequenceClassification.from_pretrained("your_org/your_model")
76
-
77
- # Example input
78
- input_text = "Your example sentence goes here."
79
- inputs = tokenizer(input_text, return_tensors="pt")
80
- outputs = model(**inputs)
81
-
82
- # Accessing the predicted class
83
- predicted_class = outputs.logits.argmax(dim=-1)
84
- print(f"Predicted class: {predicted_class}")
85
- ```
86
-
87
- ---
88
-
89
- ## Example Code
90
-
91
- Here’s an example for batch classification:
92
-
93
- ```python
94
- import torch
95
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
96
-
97
- tokenizer = AutoTokenizer.from_pretrained("your_org/your_model")
98
- model = AutoModelForSequenceClassification.from_pretrained("your_org/your_model")
99
-
100
- # Example sentences
101
- sentences = ["Sentence 1", "Sentence 2", "Sentence 3"]
102
- inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
103
-
104
- with torch.no_grad():
105
- outputs = model(**inputs)
106
-
107
- predicted_classes = outputs.logits.argmax(dim=-1)
108
- print(f"Predicted classes: {predicted_classes}")
109
- ```
110
-
111
- ---
112
-
113
- ## Related Papers
114
-
115
- This model is described in the following paper(s):
116
-
117
- - **Title:** [Paper Title](paper_url)
118
- **Authors:** [Author Names]
119
- **Conference/Journal:** [Conference/Journal Name]
120
- **Year:** [Year]
121
-
122
- Please cite this paper if you use the model.
123
-
124
- ---
125
-
126
-
127
- ## Limitations
128
-
129
- - The model is limited to [token-level/sentence-level/paragraph-level] classification tasks.
130
- - Performance may degrade on out-of-domain data.
131
- - [Other known limitations, e.g., bias in data, challenges with specific languages.]
132
 
133
  ---
134
 
135
  ## Citation
136
 
 
 
 
137
  If you use this model, please cite the following paper(s):
138
 
139
  ```bibtex
140
- @article{your_citation,
141
- title={Your Title},
142
- author={Your Name and Co-authors},
143
- journal={Journal Name},
144
- year={Year},
145
- publisher={Publisher},
146
- url={paper_url}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  }
148
- ```
149
-
150
- ---
151
-
152
- Feel free to adapt this template to match the specific needs of each model. Let me know if you'd like to adjust any sections further!
 
5
  language:
6
  - en
7
  base_model:
8
+ - mediabiasgroup/magpie-pt-xlm
9
  pipeline_tag: text-classification
10
  ---
 
11
 
12
+ This is a model pre-trained on weak labels for media-bias detection.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ---
15
 
16
  ## Citation
17
 
18
+ The code for the training is available at: https://github.com/Media-Bias-Group/Neural-Media-Bias-Detection-Using-Distant-Supervision-With-BABE
19
+ The paper is avalable at: https://aclanthology.org/2021.findings-emnlp.101
20
+
21
  If you use this model, please cite the following paper(s):
22
 
23
  ```bibtex
24
+ @inproceedings{spinde-etal-2021-neural-media,
25
+ title = "Neural Media Bias Detection Using Distant Supervision With {BABE} - Bias Annotations By Experts",
26
+ author = "Spinde, Timo and
27
+ Plank, Manuel and
28
+ Krieger, Jan-David and
29
+ Ruas, Terry and
30
+ Gipp, Bela and
31
+ Aizawa, Akiko",
32
+ editor = "Moens, Marie-Francine and
33
+ Huang, Xuanjing and
34
+ Specia, Lucia and
35
+ Yih, Scott Wen-tau",
36
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
37
+ month = nov,
38
+ year = "2021",
39
+ address = "Punta Cana, Dominican Republic",
40
+ publisher = "Association for Computational Linguistics",
41
+ url = "https://aclanthology.org/2021.findings-emnlp.101",
42
+ doi = "10.18653/v1/2021.findings-emnlp.101",
43
+ pages = "1166--1177",
44
+ abstract = "Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods.",
45
  }
46
+ ```