code block syntax highlighting
#16
by
drHt
- opened
README.md
CHANGED
@@ -75,7 +75,7 @@ so those that finish in *_0.fasta and *_1.fasta will be the best ones per batch.
|
|
75 |
**Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
|
76 |
With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
|
77 |
|
78 |
-
```
|
79 |
import torch
|
80 |
from transformers import GPT2LMHeadModel, AutoTokenizer
|
81 |
import os
|
@@ -179,7 +179,7 @@ We recommend using at least 200 sequences to obtain the best results. But we've
|
|
179 |
that many, give it still a go.
|
180 |
|
181 |
|
182 |
-
```
|
183 |
import random
|
184 |
from transformers import AutoTokenizer
|
185 |
|
@@ -350,7 +350,7 @@ To do that, you can take the trainer file that we provide in this repository (5.
|
|
350 |
The command below shows an example at an specific learning rate,
|
351 |
but you could try with other hyperparameters to obtain the best training and evaluation losses.
|
352 |
|
353 |
-
```
|
354 |
python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
|
355 |
--do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
|
356 |
--logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1
|
|
|
75 |
**Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
|
76 |
With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
|
77 |
|
78 |
+
```python
|
79 |
import torch
|
80 |
from transformers import GPT2LMHeadModel, AutoTokenizer
|
81 |
import os
|
|
|
179 |
that many, give it still a go.
|
180 |
|
181 |
|
182 |
+
```python
|
183 |
import random
|
184 |
from transformers import AutoTokenizer
|
185 |
|
|
|
350 |
The command below shows an example at an specific learning rate,
|
351 |
but you could try with other hyperparameters to obtain the best training and evaluation losses.
|
352 |
|
353 |
+
```bash
|
354 |
python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
|
355 |
--do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
|
356 |
--logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1
|