leaderboard-pr-bot's picture
Adding Evaluation Results
f5f9bcc verified
|
raw
history blame
8.28 kB
---
language:
- en
license: mit
base_model:
- meta-llama/Llama-3.2-3B-Instruct
model-index:
- name: Gladiator-Mini-Exp-1222-3B-Instruct
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 61.63
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 20.57
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 13.44
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 1.79
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 1.6
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 22.41
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MultivexAI/Gladiator-Mini-Exp-1222-3B-Instruct
name: Open LLM Leaderboard
---
* **Model size: 3.21B parameters**
# Gladiator-Mini-exp-1222-Instruct
**Gladiator-Mini-exp-1222** is a 3-billion parameter language model focused on **complex analytical tasks**. This experimental model builds upon the foundation of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct), and aims to explore what’s possible with smaller, more resource-efficient AI models. We believe small models represent the future of open source language models, making AI more accessible and adaptable for a wider range of users and applications.
**What's New in This Version?**
We've continued to refine the Gladiator-Mini series, and this version focuses on strengthening the model's analytical and problem-solving capabilities. We've also improved its ability to operate effectively with or without specific system prompts, increasing its flexibility and adaptability. The model has been trained on a larger and more varied dataset, aimed at improving overall performance.
The previous iteration, Gladiator-Mini-exp-1211, had a tendency to underperform compared to the non-fine-tuned base Llama model and required specific prompts to function effectively, making it less versatile. This version is an upgrade.
**How it Performs:**
Gladiator-Mini-exp-1222 demonstrates progress in various areas. It can approach multi-step analytical problems and is able to complete complex calculations, when needed. It also shows improved capabilities in applying logic and reason to produce an accurate answer. The model can also follow complex instructions effectively, even with minimal or no guidance, showing that its reasoning capabilities are more reliable.
**Current Performance Examples:**
To illustrate the model's current capabilities, here are some specific examples of its performance:
* **Multi-Step Calculations:** When given a mathematical problem involving a combination of multiplication, division, and addition, the model accurately identifies the steps to solve the problem and arrives at the correct answer.
* **Logical Analysis:** When given complex problems involving interwoven statements and rules, the model now uses a more structured methodology and is able to complete the required logical deductions needed to come to a conclusion, even if that conclusion is not completely correct.
* **Instruction Following:** The model can follow complex instructions to produce structured text outputs and is capable of adhering to specific requirements, such as length constraints or specific wording.
These examples represent a small selection of the types of tasks the model can handle.
**Example System Prompts (Optional):**
* **For Complex Tasks:** "You are an advanced AI with strong analytical skills. Approach the problem step-by-step and show your work.”
* **For Problem Solving:** "You are an expert problem solver. Explain your process clearly and concisely."
**What Are We Still Working On?**
Gladiator-Mini-exp-1222 remains under development. It is not perfect, and there are still areas that require further work. One notable area for improvement is creative text generation; the model is not designed for these types of tasks. It is important to recognize that, as an experimental model, its capabilities should not be overestimated. The experimental date for this model is 12/22/2024.
**Performance:**
This model has shown some encouraging results in internal testing, particularly in analytical tasks, however its performance may vary depending on the specific problem it is given. We welcome community feedback and are continually looking for ways to improve the model’s performance and reliability.
**Our Goal:**
We want to create a strong problem solver in a compact model. We believe that smaller, more efficient models are the future of AI, and this experimental version represents an important step towards that goal. We’re working towards a model that can perform at a high level without requiring large amounts of computing resources.
**How You Can Help:**
We encourage you to experiment with Gladiator-Mini-exp-1222 and let us know what you find. Your feedback is essential to future development.
**Limitations:**
* System prompts are not strictly needed, but may still be helpful.
* Its reasoning capabilities are continuously being improved.
* Do not expect excellent creative text generation.
* Like any AI model, it could produce bias or make mistakes.
**Disclaimer:**
Gladiator-Mini-exp-1222 is an experimental model and it’s best to use with caution. Please always double-check the outputs and avoid relying on them blindly.
Base model: [https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
Thanks to Meta for the fantastic Llama-3.2-3B model!
**Finetuning Dataset:**
* The model was fine-tuned on a privately collected dataset. Further details on training data are withheld.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/MultivexAI__Gladiator-Mini-Exp-1222-3B-Instruct-details)
| Metric |Value|
|-------------------|----:|
|Avg. |20.24|
|IFEval (0-Shot) |61.63|
|BBH (3-Shot) |20.57|
|MATH Lvl 5 (4-Shot)|13.44|
|GPQA (0-shot) | 1.79|
|MuSR (0-shot) | 1.60|
|MMLU-PRO (5-shot) |22.41|