mediabiasgroup
/

da-roberta-pt

@@ -5,148 +5,42 @@ datasets:
 language:
 - en
 base_model:
-- FacebookAI/roberta-base
 pipeline_tag: text-classification
 ---
-Here’s a template for a `README.md` file that you can reuse for each of your models on Hugging Face. It is designed to provide a comprehensive overview of the model, its usage, links to relevant papers, datasets, and results:
----
-# Model Name
-**Model Name:** `Your Model Name`
-**Model Type:** Token-level / Sentence-level / Paragraph-level Classifier
-**Organization:** [Your Lab's Name or Organization](https://huggingface.co/your_org)
-**Model Version:** `v1.0.0`
-**Framework:** `PyTorch` or `TensorFlow`
-**License:** `MIT / Apache 2.0 / Other`
----
-## Model Overview
-This model is a [token-level/sentence-level/paragraph-level] classifier that was trained for [specific task, e.g., sentiment analysis, named entity recognition, etc.]. The model is based on [model architecture, e.g., BERT, RoBERTa, etc.] and has been fine-tuned on [mention the dataset] for [number of epochs or other training details].
-It achieves state-of-the-art performance on [mention dataset or task] and is specifically designed for [specific domain or industry, if applicable].
----
-## Training details
-- **Base Model:** [mention architecture, e.g., BERT-base, RoBERTa-large, etc.]
-- **Number of Parameters:** [number of parameters]
-- **Max Sequence Length:** [max input length, if relevant]
-### Training Data
-The model was fine-tuned on the [name of dataset] dataset. This dataset consists of [short description of dataset, e.g., number of instances, labels, any important data characteristics].
-You can find the dataset [here](dataset_url).
----
-## Evaluation Results
-The model was evaluated on [name of dataset] and achieved the following results:
-- **Accuracy:** [accuracy score]
-- **F1-Score:** [F1 score]
-- **Precision:** [precision score]
-- **Recall:** [recall score]
-For detailed evaluation results, see the corresponding paper or evaluation logs.
----
-## Usage
-To use this model in your code, install the required libraries:
-```bash
-pip install transformers
-```
-Then, load the model as follows:
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("your_org/your_model")
-model = AutoModelForSequenceClassification.from_pretrained("your_org/your_model")
-# Example input
-input_text = "Your example sentence goes here."
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model(**inputs)
-# Accessing the predicted class
-predicted_class = outputs.logits.argmax(dim=-1)
-print(f"Predicted class: {predicted_class}")
-```
----
-## Example Code
-Here’s an example for batch classification:
-```python
-import torch
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("your_org/your_model")
-model = AutoModelForSequenceClassification.from_pretrained("your_org/your_model")
-# Example sentences
-sentences = ["Sentence 1", "Sentence 2", "Sentence 3"]
-inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
-with torch.no_grad():
-    outputs = model(**inputs)
-predicted_classes = outputs.logits.argmax(dim=-1)
-print(f"Predicted classes: {predicted_classes}")
-```
----
-## Related Papers
-This model is described in the following paper(s):
-- **Title:** [Paper Title](paper_url)
-  **Authors:** [Author Names]
-  **Conference/Journal:** [Conference/Journal Name]
-  **Year:** [Year]
-Please cite this paper if you use the model.
----
-## Limitations
-- The model is limited to [token-level/sentence-level/paragraph-level] classification tasks.
-- Performance may degrade on out-of-domain data.
-- [Other known limitations, e.g., bias in data, challenges with specific languages.]
 ---
 ## Citation
 If you use this model, please cite the following paper(s):
 ```bibtex
-@article{your_citation,
-  title={Your Title},
-  author={Your Name and Co-authors},
-  journal={Journal Name},
-  year={Year},
-  publisher={Publisher},
-  url={paper_url}
 }
-```
----
-Feel free to adapt this template to match the specific needs of each model. Let me know if you'd like to adjust any sections further!

 language:
 - en
 base_model:
+- mediabiasgroup/magpie-pt-xlm
 pipeline_tag: text-classification
 ---
+This is a model pre-trained on weak labels for media-bias detection.
 ---
 ## Citation
+The code for the training is available at: https://github.com/Media-Bias-Group/Neural-Media-Bias-Detection-Using-Distant-Supervision-With-BABE
+The paper is avalable at: https://aclanthology.org/2021.findings-emnlp.101
 If you use this model, please cite the following paper(s):
 ```bibtex
+@inproceedings{spinde-etal-2021-neural-media,
+    title = "Neural Media Bias Detection Using Distant Supervision With {BABE} - Bias Annotations By Experts",
+    author = "Spinde, Timo  and
+      Plank, Manuel  and
+      Krieger, Jan-David  and
+      Ruas, Terry  and
+      Gipp, Bela  and
+      Aizawa, Akiko",
+    editor = "Moens, Marie-Francine  and
+      Huang, Xuanjing  and
+      Specia, Lucia  and
+      Yih, Scott Wen-tau",
+    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
+    month = nov,
+    year = "2021",
+    address = "Punta Cana, Dominican Republic",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2021.findings-emnlp.101",
+    doi = "10.18653/v1/2021.findings-emnlp.101",
+    pages = "1166--1177",
+    abstract = "Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods.",
 }
+```