Some MMLU-PRO test
#3
by
rascazzione
- opened
I've tested this model on MMLU-PRO:
Whithou kv cache (a few hours of test):
Athene-V2-Chat-AWQ CoT all 01-04_12-09 0.7126828456854486
Athene-V2-Chat-AWQ CoT computer_science 01-05_12-47 0.7658536566686496
With FP8 E4M3 KV Cache (2 shots) (about 10 minutes each one with 4x rtx4090)
Athene-V2-Chat-AWQ CoT computer_science 01-11_11-42 0.7585365835157645
Athene-V2-Chat-AWQ CoT computer_science 01-11_12-00 0.743902437209994
Accuracy degrades again a bit, but the speed factor is 10 times higher than if kv-cache is not used).
With FP8 E5M2 KV Cache (2 shots) (about 10 minutes each one with 4x rtx4090)
Athene-V2-Chat-AWQ CoT computer_science 01-11_12-12 0.7560975591314694
Athene-V2-Chat-AWQ CoT computer_science 01-11_12-22 0.7463414615942892