File size: 10,322 Bytes
b4f814f 39379dc b4f814f 334ec1e b4f814f 2d1186b b4f814f 0d376fc 227e029 b4f814f d123cee b4f814f d478a4a b4f814f ea44f6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
---
license: apache-2.0
---
<style>
table {
border-collapse: collapse;
width: 100%;
margin-bottom: 20px;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: center;
}
.best {
font-weight: bold;
text-decoration: underline;
}
</style>
<div style="text-align: center; margin: 20px auto; padding: 20px; border: 3px solid #ddd; border-radius: 10px;">
<h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
<a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">๐ OuteAI.com</a>
<a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">๐ค Join our Discord</a>
<a href="https://x.com/OuteAI" target="_blank">๐ @OuteAI</a>
</div>
## Introduction
We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M. <br>
This is our third generation model featuring the new Mamba2 architecture with attention layers. <br>
If you're interested in more technical details that covers the training process, architecture, and performance: <a href="https://outeai.com/blog/lite-oute-2-mamba2attn" target="_blank">Read the full blog post here</a>
## Model Variants
- [Lite-Oute-2-Mamba2Attn-250M-Instruct](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Instruct)
- [Lite-Oute-2-Mamba2Attn-250M-Base](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base)
## Training Details
The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:
- **50% dclm-baseline-1.0**
- **50% fineweb-edu**
Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:
- **Max learning rate:** 4e-4
- **Min learning rate:** 1e-4
- **Block size:** 4096
- **Token batches:** ~100k tokens
For instruction training, we first trained the model with Supervised Fine-tuning (SFT) then further refined the model using DPO training.
## Benchmark Results
<table>
<tr>
<th>Benchmark</th>
<th>Lite-Oute-2-Mamba2Attn-250M-Instruct</th>
<th>Lite-Oute-1-300M-Instruct</th>
<th>Lite-Mistral-150M-v2-Instruct</th>
</tr>
<tr>
<td>ARC-C (0-shot)</td>
<td class="best">26.71</td>
<td>26.02</td>
<td>-</td>
</tr>
<tr>
<td>ARC-E (0-shot)</td>
<td class="best">53.70</td>
<td>49.79</td>
<td>39.52</td>
</tr>
<tr>
<td>HellaSWAG (0-shot)</td>
<td class="best">38.19</td>
<td>34.50</td>
<td>31.01</td>
</tr>
<tr>
<td>MMLU (0-shot)</td>
<td class="best">25.13</td>
<td>24.00</td>
<td>25.28</td>
</tr>
<tr>
<td>OpenBookQA (0-shot)</td>
<td class="best">32.20</td>
<td>32.20</td>
<td>28.40</td>
</tr>
<tr>
<td>PIQA (0-shot)</td>
<td class="best">66.59</td>
<td>65.40</td>
<td>60.23</td>
</tr>
<tr>
<td>Winogrande (0-shot)</td>
<td>53.28</td>
<td class="best">53.75</td>
<td>51.78</td>
</tr>
<tr>
<td>ARC-C (5-shot)</td>
<td class="best">27.82</td>
<td>26.37</td>
<td>-</td>
</tr>
<tr>
<td>ARC-E (5-shot)</td>
<td class="best">57.28</td>
<td>51.43</td>
<td>46.30</td>
</tr>
<tr>
<td>HellaSWAG (5-shot)</td>
<td class="best">37.71</td>
<td>34.93</td>
<td>30.73</td>
</tr>
<tr>
<td>MMLU (5-shot)</td>
<td>25.81</td>
<td class="best">25.87</td>
<td>-</td>
</tr>
<tr>
<td>OpenBookQA (5-shot)</td>
<td class="best">31.80</td>
<td>31.40</td>
<td>28.00</td>
</tr>
<tr>
<td>PIQA (5-shot)</td>
<td class="best">66.38</td>
<td>65.07</td>
<td>62.68</td>
</tr>
<tr>
<td>Winogrande (5-shot)</td>
<td>51.14</td>
<td class="best">52.01</td>
<td>51.30</td>
</tr>
</table>
## Interfacing with the Instruct Model
> [!IMPORTANT]
> To ensure optimal performance, please use the following template when interacting with the model:
```
<s>system\n
{System message}</s>
<s>user\n
{Your question or message}</s>
<s>assistant\n
{The model's response}</s>
```
> [!IMPORTANT]
> When prompting the model for a response, your input should be structured as follows:
```
<s>system\n
{System message}</s>
<s>user\n
{Your question or message}</s>
<s>assistant\n
```
This format ensures the best performance, as the model was fine-tuned using this specific template.
### Recommended Settings
- **Temperature:** 0.1 - 0.4
- **Repetition Penalty:** 1.1 - 1.14
If you're using a Hugging Face tokenizer, make sure to use `tokenizer.apply_chat_template` or adjust your interface to match the template above.
## Let's look at some real world examples
```
<s>user
What are three key benefits of electric vehicles compared to traditional gasoline-powered cars?</s>
<s>assistant
Electric vehicles have several advantages over gasoline-powered cars. Here are three key benefits:
1. Efficient: Electric vehicles use less fuel and emit fewer emissions than gasoline-powered cars, which can save consumers money on their monthly bills.
2. Fuel economy: Electric vehicles produce more power per mile driven, which means they consume less fuel overall. This helps reduce the need for refueling stations and reduces carbon emissions from transportation.
3. Flexibility: Electric vehicles can be adapted to different driving conditions, such as snowy or rainy days, and can accommodate different types of terrain. This makes them easier to operate in urban areas with limited parking spaces.
Overall, electric vehicles offer a range of benefits that make them an attractive choice for many drivers. They provide a cost-effective alternative to gasoline-powered cars, reducing the need for frequent refueling stations, and making it easier to drive long distances without having to worry about running out of gas.</s>
```
```
<s>user
Who wrote the play "Romeo and Juliet"?</s>
<s>assistant
The play "Romeo and Juliet" by William Shakespeare is a classic example of a tragic love story. It tells the story of Romeo and Juliet, a young couple who fall in love with each other but ultimately lose their lives due to their respective flaws. The play was written during the Renaissance period and it has been performed around the world since then.</s>
```
## Usage with HuggingFace transformers
Model weights were converted from the original Mamba2 implementation to be Hugging Face compatible. <br>
Due to the lack of official support for Mamba2 attention layers in Hugging Face Transformers, custom modeling files are included. <br>
The implementation of Mamba2 with attention in the modeling files comes from Pull Request #32027 in the Hugging Face Transformers repository: [https://github.com/huggingface/transformers/pull/32027](https://github.com/huggingface/transformers/pull/32027)
To speed up inference, we recommend installing mamba-ssm and flash attention 2.
mamba-ssm:
```bash
pip install causal-conv1d>=1.4.0
pip install mamba-ssm
```
flash attention 2:
```bash
pip install flash-attn --no-build-isolation
```
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
"OuteAI/Lite-Oute-2-Mamba2Attn-Instruct",
# To allow custom modeling files
trust_remote_code=True,
# If you have installed flash attention 2
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Instruct")
def generate_response(message: str, temperature: float = 0.1, repetition_penalty: float = 1.12) -> str:
# Apply the chat template and convert to PyTorch tensors
messages = [
{"role": "system", "content": "You are an AI assistant. You will be given a task. You must generate a detailed answer."},
{"role": "user", "content": message}
]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(device)
# Generate the response
output = model.generate(
input_ids,
max_length=256,
temperature=temperature,
repetition_penalty=repetition_penalty,
do_sample=True
)
# Decode the generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
return generated_text
message = "Who wrote the play Romeo and Juliet?"
response = generate_response(message)
print(response)
```
## Fine-Tuning for Specific Tasks:
For optimal task-specific performance, it is recommended to fine-tune the model on datasets relevant to the intended use case. <br>
Fine-tuning can significantly improve the model's accuracy, relevance, and depth of knowledge in specialized domains or for particular types of tasks. <br>
This process allows the model to adapt its pre-trained knowledge to the nuances and requirements of specific applications, potentially mitigating some of the weaknesses observed in general use.
## Conclusion
The Lite-Oute-2-Mamba2Attn-250M-Instruct model shows significant improvements over previous versions, particularly in zero-shot and few-shot learning tasks. <br>
Despite its smaller size, it outperforms older models in most benchmarks, demonstrating better common sense reasoning, language understanding, and general knowledge. <br>
While it can provide coherent responses to various queries, it has limitations due to its small size (250 million parameters). <br>
Users should be aware that it may produce incorrect outputs and should not be relied upon for critical decisions without verification. <br>
It's best suited for lightweight applications or as a basis for further development.
## Disclaimer
By using this model, you acknowledge that you understand and assume the risks associated with its use.
You are solely responsible for ensuring compliance with all applicable laws and regulations.
We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages.
We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.
|