przvl
/

persuasive_essays_distilbert_cased

@@ -5,34 +5,43 @@ tags:
 - generated_from_trainer
 metrics:
 - accuracy
 model-index:
 - name: persuasive_essays_distilbert_cased
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # persuasive_essays_distilbert_cased
-This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4249
 - Accuracy: 0.8101
 - Macro F1: 0.7662
 - Claim F1: 0.665
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -61,4 +70,4 @@ The following hyperparameters were used during training:
 - Transformers 4.37.2
 - Pytorch 2.2.0
 - Datasets 2.17.0
-- Tokenizers 0.15.2

 - generated_from_trainer
 metrics:
 - accuracy
+- f1
 model-index:
 - name: persuasive_essays_distilbert_cased
   results: []
+language:
+- en
 ---
 # persuasive_essays_distilbert_cased
+## Model description
+This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on the [emnlp2017-claim-identification/persuasive_essays](https://github.com/UKPLab/emnlp2017-claim-identification) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4249
 - Accuracy: 0.8101
 - Macro F1: 0.7662
 - Claim F1: 0.665
 ## Intended uses & limitations
+Text classification for claims on full sentences. The model perfoms better at in-domain classification. Cross-domain classification is severely limited.
 ## Training and evaluation data
+Based on [Stab and Gurevych (2017)](https://aclanthology.org/J17-3005.pdf) persuasive essays corpus, preprocessed by [Daxenberger et al. (2017)]((https://github.com/UKPLab/emnlp2017-claim-identification).
+Original dataset
+  - docs: 402
+  - tokens: 147,271
+  - total instances: 7,116 (65 duplicates)
+    - #claims: 2,108 (29.62%)
+Trimmed datast used for training
+  - total instances: **7051** (65 duplicates removed)
+    - #claims: **2093** (29.68%)
+  - train/test split: 80/20, stratified
 ## Training procedure
 - Transformers 4.37.2
 - Pytorch 2.2.0
 - Datasets 2.17.0
+- Tokenizers 0.15.2