AI4PD
/

ZymCTRL

@@ -75,7 +75,7 @@ so those that finish in *_0.fasta and *_1.fasta will be the best ones per batch.
 **Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
 With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
-```
 import torch
 from transformers import GPT2LMHeadModel, AutoTokenizer
 import os
@@ -179,7 +179,7 @@ We recommend using at least 200 sequences to obtain the best results. But we've
 that many, give it still a go.
-```
 import random
 from transformers import AutoTokenizer
@@ -350,7 +350,7 @@ To do that, you can take the trainer file that we provide in this repository (5.
 The command below shows an example at an specific learning rate,
 but you could try with other hyperparameters to obtain the best training and evaluation losses.
-```
 python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
  --do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
  --logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1

 **Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
 With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
+```python
 import torch
 from transformers import GPT2LMHeadModel, AutoTokenizer
 import os
 that many, give it still a go.
+```python
 import random
 from transformers import AutoTokenizer
 The command below shows an example at an specific learning rate,
 but you could try with other hyperparameters to obtain the best training and evaluation losses.
+```bash
 python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
  --do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
  --logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1