lilloukas commited on
Commit
458318d
1 Parent(s): 02008e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -41,7 +41,7 @@ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/E
41
 
42
 
43
  ## Reproducing Evaluation Results
44
- Install LM Evaluation Harness
45
  ```
46
  git clone https://github.com/EleutherAI/lm-evaluation-harness
47
  cd lm-evaluation-harness
@@ -49,22 +49,22 @@ pip install -e .
49
  ```
50
  Each task was evaluated on a single A100 80GB GPU.
51
 
52
- ARC
53
  ```
54
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
55
  ```
56
 
57
- HellaSwag
58
  ```
59
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
60
  ```
61
 
62
- MMLU
63
  ```
64
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
65
  ```
66
 
67
- TruthfulQA
68
  ```
69
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
70
  ```
 
41
 
42
 
43
  ## Reproducing Evaluation Results
44
+ Install LM Evaluation Harness:
45
  ```
46
  git clone https://github.com/EleutherAI/lm-evaluation-harness
47
  cd lm-evaluation-harness
 
49
  ```
50
  Each task was evaluated on a single A100 80GB GPU.
51
 
52
+ ARC:
53
  ```
54
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
55
  ```
56
 
57
+ HellaSwag:
58
  ```
59
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
60
  ```
61
 
62
+ MMLU:
63
  ```
64
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
65
  ```
66
 
67
+ TruthfulQA:
68
  ```
69
  python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
70
  ```