kosbu/Athene-V2-Chat-AWQ · Some MMLU-PRO test

I've tested this model on MMLU-PRO:

Whithou kv cache (a few hours of test):

Athene-V2-Chat-AWQ CoT all 01-04_12-09 0.7126828456854486
Athene-V2-Chat-AWQ CoT computer_science 01-05_12-47 0.7658536566686496

With FP8 E4M3 KV Cache (2 shots) (about 10 minutes each one with 4x rtx4090)

Athene-V2-Chat-AWQ CoT computer_science 01-11_11-42 0.7585365835157645
Athene-V2-Chat-AWQ CoT computer_science 01-11_12-00 0.743902437209994

Accuracy degrades again a bit, but the speed factor is 10 times higher than if kv-cache is not used).

With FP8 E5M2 KV Cache (2 shots) (about 10 minutes each one with 4x rtx4090)

Athene-V2-Chat-AWQ CoT computer_science 01-11_12-12 0.7560975591314694
Athene-V2-Chat-AWQ CoT computer_science 01-11_12-22 0.7463414615942892