deliciouscat commited on
Commit
c5c9564
·
verified ·
1 Parent(s): e2ca893

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -8
README.md CHANGED
@@ -1,10 +1,16 @@
 
 
 
 
 
 
1
  # Encoder-Decoder model with DeBERTa decoder
2
 
3
  ## pre-trained models
4
 
5
- Encoder: `microsoft/deberta-v3-small`
6
 
7
- Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1`; 6 transformer layers, 8 attention heads
8
 
9
  ## Data used
10
 
@@ -12,20 +18,23 @@ Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1`; 6 transformer layers, 8 at
12
 
13
  ## Training hparams
14
 
15
- optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997)
16
- batch size: 12 (maximal on Colab pro A100 env)
 
 
 
17
 
18
  ## How to use
19
 
20
  ```
21
  from transformers import AutoTokenizer, EncoderDecoderModel
22
 
23
- model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
24
- tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
25
  ```
26
 
27
  ## Future work!
28
 
29
- train more scientific data
30
 
31
- fine-tune on keyword extraction task
 
1
+ ---
2
+ datasets:
3
+ - HuggingFaceFW/fineweb
4
+ language:
5
+ - en
6
+ ---
7
  # Encoder-Decoder model with DeBERTa decoder
8
 
9
  ## pre-trained models
10
 
11
+ - Encoder: `microsoft/deberta-v3-small`
12
 
13
+ - Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1` (6 transformer layers, 8 attention heads)
14
 
15
  ## Data used
16
 
 
18
 
19
  ## Training hparams
20
 
21
+ - optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997)
22
+
23
+ - batch size: 12 (maximal on Colab pro A100 env)
24
+
25
+ -> training on denoising objective (BART)
26
 
27
  ## How to use
28
 
29
  ```
30
  from transformers import AutoTokenizer, EncoderDecoderModel
31
 
32
+ model = EncoderDecoderModel.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
33
+ tokenizer = AutoTokenizer.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
34
  ```
35
 
36
  ## Future work!
37
 
38
+ - train more scientific data
39
 
40
+ - fine-tune on keyword extraction task