Gladiator-Mini-Exp-1221-3B-Instruct - V2: Enhanced Performance
This is V2, an improved iteration of our Gladiator-Mini-Exp-1221-3B-Instruct model, fine-tuned from Llama-3.2-3B-Instruct with a slightly expanded dataset and increased training epochs.
Major Improvements in V2:
- Superior Performance: V2 demonstrates enhanced performance across multiple benchmarks compared to V1.
- Mathematics Boost: Notably, V2 surpasses both the 1211 model and V1 in mathematical reasoning, as evidenced by the MATH benchmark results below.
- Model Size: 3.21 Billion parameters
Benchmark Highlights:
Benchmark | 1211 | V1 | V2 |
---|---|---|---|
MATH | 13.44% | 13.07% | 13.75% |
IFEval | 60.79% | 62.15% | |
BBH | 20.40% | 20.65% |
In summary, V2 offers a noticeable performance upgrade over V1, particularly in mathematical tasks. Explore the model and experience the improvements!
Gladiator-Mini-exp-1221-Instruct
Gladiator-Mini-exp-1221 is a 3-billion parameter language model focused on complex reasoning. Built upon the foundation of meta-llama/Llama-3.2-3B-Instruct, this experimental model is designed to explore what's achievable with smaller models in analytical thinking. It's all about pushing boundaries and learning what's possible in resource-efficient AI. We believe small models represent the future of open source language models, making AI more accessible and adaptable for a wider range of users and applications.
What's Different This Time?
We've been working hard to refine the Gladiator-Mini series, and this version builds on the previous one by focusing on more robust reasoning abilities. We've also reduced its dependence on specific system prompts – meaning it can tackle problems more independently. In addition, the model has been trained on a larger and more diverse dataset, so it can perform better in a range of tasks.
It's worth noting that the previous iteration, Gladiator-Mini-exp-1211, had a tendency to underperform compared to the non-fine-tuned base Llama model, and it didn't function well without a system prompt, making it less versatile. This version is an upgrade.
Performance Highlights:
We've seen some encouraging results with Gladiator-Mini-exp-1221. Here are a few examples that show what it can do:
- Math: When given a calculus problem to solve involving definite integrals, the model not only calculated the correct answer, but also explained each step clearly, showing its understanding of the math.
- Logic: It’s improved its approach on logical reasoning puzzles. When tested on the “three siblings” logic puzzle, which involves interconnected truths, lies, and a rotating truth teller, the model did not completely solve it, but demonstrated a significantly improved ability to apply logic compared to the base (non-finetuned) model.
- Instruction Following: We tested the model's ability to follow tricky directions by giving it this task: "Generate a text with exactly 7 words. The first and last words must be synonyms.” It responded with "Happy people are very joyful people today," successfully following all of the constraints.
These examples are just a small snapshot of what the model can achieve. We're continuing to test its abilities and are hopeful that it will continue to improve.
Where Does it Shine?
- Math Prowess: Gladiator-Mini-exp-1221 has made progress in math problem-solving. It's handling multi-step algebra, solving definite integrals in calculus, and even explaining its thinking along the way.
- Logical Thinking: The model now has a more structured approach when facing complex logical problems and can apply its logic more often to find the right answers.
- Following Directions: It's noticeably better at following instructions, even those with specific constraints (like creating a seven-word sentence with synonyms). It also does this without needing as much hand-holding.
- Independent Problem Solving: It can more often figure things out on its own without needing prompts. It's still good to provide a system prompt though, if you want the best possible outcome.
Example System Prompts (Just In Case):
- For General Reasoning: "You're an advanced AI with strong reasoning. Explain your thinking step-by-step."
- For Math Problems: "You're a math expert. Break down the problem and explain your process."
What Are We Still Working On?
Gladiator-Mini-exp-1221 is still an experiment in progress. It's not perfect, and there are still areas we're actively working on. One area where it's still pretty rough around the edges is creative text generation. It excels at analytical tasks, but it might struggle if you ask it to write a poem or a short story, for example. It is also important to not overestimate the model capabilities.
The experimental date for this model is 12/21/2024.
Performance:
The model has shown some promising results internally, especially with mathematical tasks. However, performance can vary depending on the specific task. We welcome feedback on where it excels and where it could be improved.
Our Goal:
We want to create a strong reasoning engine in a compact model. We believe that smaller, more efficient models are the future of AI, and this experimental version is an important step towards that goal. We're working to develop a model that can handle complex reasoning as well as the larger models while staying far more resource-efficient.
How You Can Help:
We'd love for you to experiment with Gladiator-Mini-exp-1221 and let us know what you find. We need the feedback to improve!
Limitations:
- System prompts aren't strictly necessary, but can still help.
- Its reasoning capabilities are still being refined.
- Don't expect excellent creative text generation.
- Like any AI model, it could produce bias or make mistakes.
Disclaimer:
Gladiator-Mini-exp-1221 is an experimental model and it's best to use with caution. Please always double check the outputs and do not trust it blindly.
Base model: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
Thanks to Meta for the fantastic Llama-3.2-3B model!
Finetuning Dataset:
- The model was fine-tuned on a privately collected dataset. Further details on training data are withheld.
- Downloads last month
- 12