UNAversal - Uniform Neural Alignment (MoE)
This is just a beta, a first release so people can start working on franksteins and so. It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it Based on mistralai/Mixtral-8x7B-Instruct-v0.1
UNA Details
For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so. So this model DOES have UNA-SFT phase, its highly experimental and it was merely using LLaMA-Factory datasets by example alpaca.
As the others:
- Can be finetuned further, try 2e-5 or 1e-4 (since its MOE)
- Can be merged, here you will have to improvise and please report findings on a discussion thread.
REMINDER: please.. cite, it does help on the research and the lab itself, seriously.
NEED YOUR HELP!!
I need a multi-turn trainloop for the Mixtral, that can squeeze the juice out of 8xH100's properly. Please feel free to reach @fblgit either discord or twitter. thanks!
Evals
Here there are some, but we also submitted it to the HF eval queue....
GSM8k 5-Shot
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml |get-answer| 5|exact_match|0.6603|± | 0.013|
ARC 25-Shot
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml |none | 25|acc |0.6621|± |0.0138|
| | |none | 25|acc_norm|0.6962|± |0.0134|
TruthfulQA 0-Shot (MC2)
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml |none | 0|acc |0.7122|± |0.0141|
0-Shots Evals
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|arc_challenge |Yaml |none | 0|acc |0.6101|± |0.0143|
| | |none | 0|acc_norm |0.6425|± |0.0140|
|arc_easy |Yaml |none | 0|acc |0.8615|± |0.0071|
| | |none | 0|acc_norm |0.8375|± |0.0076|
|boolq |Yaml |none | 0|acc |0.8624|± |0.0060|
|lambada_openai|Yaml |none | 0|perplexity|2.8318|± |0.0507|
| | |none | 0|acc |0.7650|± |0.0059|
|mathqa |Yaml |none | 0|acc |0.4472|± |0.0091|
| | |none | 0|acc_norm |0.4436|± |0.0091|
|piqa |Yaml |none | 0|acc |0.8292|± |0.0088|
| | |none | 0|acc_norm |0.8422|± |0.0085|
|pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182|
|sciq |Yaml |none | 0|acc |0.9630|± |0.0060|
| | |none | 0|acc_norm |0.9370|± |0.0077|
BBH at 0-Shot
vllm (pretrained=fblgit/UNAversal-8x7B-v1beta,tensor_parallel_size=2,data_parallel_size=4,gpu_memory_utilization=0.8,dtype=float16), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
| Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
|----------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh |N/A |get-answer| 0|exact_match|0.6752|± |0.1772|
| - bbh_cot_fewshot_boolean_expressions |Yaml |get-answer| 0|exact_match|0.8840|± |0.0203|
| - bbh_cot_fewshot_causal_judgement |Yaml |get-answer| 0|exact_match|0.6417|± |0.0352|
| - bbh_cot_fewshot_date_understanding |Yaml |get-answer| 0|exact_match|0.7600|± |0.0271|
| - bbh_cot_fewshot_disambiguation_qa |Yaml |get-answer| 0|exact_match|0.7160|± |0.0286|
| - bbh_cot_fewshot_dyck_languages |Yaml |get-answer| 0|exact_match|0.1800|± |0.0243|
| - bbh_cot_fewshot_formal_fallacies |Yaml |get-answer| 0|exact_match|0.6520|± |0.0302|
| - bbh_cot_fewshot_geometric_shapes |Yaml |get-answer| 0|exact_match|0.3880|± |0.0309|
| - bbh_cot_fewshot_hyperbaton |Yaml |get-answer| 0|exact_match|0.9600|± |0.0124|
| - bbh_cot_fewshot_logical_deduction_five_objects |Yaml |get-answer| 0|exact_match|0.5360|± |0.0316|
| - bbh_cot_fewshot_logical_deduction_seven_objects |Yaml |get-answer| 0|exact_match|0.5040|± |0.0317|
| - bbh_cot_fewshot_logical_deduction_three_objects |Yaml |get-answer| 0|exact_match|0.8600|± |0.0220|
| - bbh_cot_fewshot_movie_recommendation |Yaml |get-answer| 0|exact_match|0.7840|± |0.0261|
| - bbh_cot_fewshot_multistep_arithmetic_two |Yaml |get-answer| 0|exact_match|0.6600|± |0.0300|
| - bbh_cot_fewshot_navigate |Yaml |get-answer| 0|exact_match|0.8160|± |0.0246|
| - bbh_cot_fewshot_object_counting |Yaml |get-answer| 0|exact_match|0.8360|± |0.0235|
| - bbh_cot_fewshot_penguins_in_a_table |Yaml |get-answer| 0|exact_match|0.7329|± |0.0367|
| - bbh_cot_fewshot_reasoning_about_colored_objects |Yaml |get-answer| 0|exact_match|0.8120|± |0.0248|
| - bbh_cot_fewshot_ruin_names |Yaml |get-answer| 0|exact_match|0.4440|± |0.0315|
| - bbh_cot_fewshot_salient_translation_error_detection |Yaml |get-answer| 0|exact_match|0.5200|± |0.0317|
| - bbh_cot_fewshot_snarks |Yaml |get-answer| 0|exact_match|0.7135|± |0.0340|
| - bbh_cot_fewshot_sports_understanding |Yaml |get-answer| 0|exact_match|0.9400|± |0.0151|
| - bbh_cot_fewshot_temporal_sequences |Yaml |get-answer| 0|exact_match|0.7560|± |0.0272|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects |Yaml |get-answer| 0|exact_match|0.5680|± |0.0314|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects|Yaml |get-answer| 0|exact_match|0.6280|± |0.0306|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects|Yaml |get-answer| 0|exact_match|0.6280|± |0.0306|
| - bbh_cot_fewshot_web_of_lies |Yaml |get-answer| 0|exact_match|0.9560|± |0.0130|
| - bbh_cot_fewshot_word_sorting |Yaml |get-answer| 0|exact_match|0.3800|± |0.0308|
|Groups|Version| Filter |n-shot| Metric |Value | |Stderr|
|------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh |N/A |get-answer| 0|exact_match|0.6752|± |0.1772|
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 73.78 |
AI2 Reasoning Challenge (25-Shot) | 69.80 |
HellaSwag (10-Shot) | 86.90 |
MMLU (5-Shot) | 70.39 |
TruthfulQA (0-shot) | 71.97 |
Winogrande (5-shot) | 82.00 |
GSM8k (5-shot) | 61.64 |
- Downloads last month
- 752
Model tree for fblgit/UNAversal-8x7B-v1beta
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard69.800
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard86.900
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard70.390
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard71.970
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard82.000
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard61.640