przvl commited on
Commit
b41fdb8
·
verified ·
1 Parent(s): 3173a5f

Update README.md

Browse files

updated model after retraining on stratified train/test split

Files changed (1) hide show
  1. README.md +20 -11
README.md CHANGED
@@ -5,34 +5,43 @@ tags:
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
 
8
  model-index:
9
  - name: persuasive_essays_distilbert_cased
10
  results: []
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
  # persuasive_essays_distilbert_cased
17
 
18
- This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on the None dataset.
 
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.4249
21
  - Accuracy: 0.8101
22
  - Macro F1: 0.7662
23
  - Claim F1: 0.665
24
 
25
- ## Model description
26
-
27
- More information needed
28
-
29
  ## Intended uses & limitations
30
 
31
- More information needed
32
 
33
  ## Training and evaluation data
34
 
35
- More information needed
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ## Training procedure
38
 
@@ -61,4 +70,4 @@ The following hyperparameters were used during training:
61
  - Transformers 4.37.2
62
  - Pytorch 2.2.0
63
  - Datasets 2.17.0
64
- - Tokenizers 0.15.2
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
8
+ - f1
9
  model-index:
10
  - name: persuasive_essays_distilbert_cased
11
  results: []
12
+ language:
13
+ - en
14
  ---
15
 
 
 
 
16
  # persuasive_essays_distilbert_cased
17
 
18
+ ## Model description
19
+
20
+ This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on the [emnlp2017-claim-identification/persuasive_essays](https://github.com/UKPLab/emnlp2017-claim-identification) dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.4249
23
  - Accuracy: 0.8101
24
  - Macro F1: 0.7662
25
  - Claim F1: 0.665
26
 
 
 
 
 
27
  ## Intended uses & limitations
28
 
29
+ Text classification for claims on full sentences. The model perfoms better at in-domain classification. Cross-domain classification is severely limited.
30
 
31
  ## Training and evaluation data
32
 
33
+ Based on [Stab and Gurevych (2017)](https://aclanthology.org/J17-3005.pdf) persuasive essays corpus, preprocessed by [Daxenberger et al. (2017)]((https://github.com/UKPLab/emnlp2017-claim-identification).
34
+
35
+ Original dataset
36
+ - docs: 402
37
+ - tokens: 147,271
38
+ - total instances: 7,116 (65 duplicates)
39
+ - #claims: 2,108 (29.62%)
40
+
41
+ Trimmed datast used for training
42
+ - total instances: **7051** (65 duplicates removed)
43
+ - #claims: **2093** (29.68%)
44
+ - train/test split: 80/20, stratified
45
 
46
  ## Training procedure
47
 
 
70
  - Transformers 4.37.2
71
  - Pytorch 2.2.0
72
  - Datasets 2.17.0
73
+ - Tokenizers 0.15.2