RozGrov's picture
Update README.md
d1abffc verified
---
library_name: transformers
language:
- en
base_model:
- RozGrov/NemoDori-v0.2-12B-MN-BT
datasets:
- Inv/c2-logs-cleaned-deslopped
tags:
- unsloth
- trl
- sft
- merge
- mergekit
- lazymergekit
- RozGrov/NemoDori-v0.2-12B-MN-BT
---
# NemoDori-v0.2-Frankend.2-v1-16.6B
_Experimental!_
<br>
A more upscaled version of [**NemoDori-v0.2-12B-MN-BT**](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT), now at **16.6B**.
This is also my first _successful_(?) fine-tuned model using **500 random rows** from dataset
[Inv/c2-logs-cleaned-deslopped](https://huggingface.co/datasets/Inv/c2-logs-cleaned-deslopped) in 70 steps.
The reason I used that dataset is... just for testing. What I thought is, if I can replace/fill up those duplicated layers by training it, maybe that makes it better.
NemoDori v0.2 is my best merge model so far, but I'm afraid it's still 12B, not much to improve after merging all kinds of models.
<br>
Again, I'm just interested to play with these LLM stuff for awhile. Maybe more version of this will come out later.
As far from my short testing, this model has become a little more strict than the parent model (v0.2).I haven't notice anything major yet.
<br>
You can use ST with this preset [here](https://huggingface.co/RozGrov/NemoDori-v0.2-Frankend.2-v1-16.6B/resolve/main/NemoDori-v0.2-Frankend.2-v1-16.6B%20-%20ST%20Preset.json).
Unfortunately, you can't go wild with this model (from my short tests), sometimes it makes little senses, and sometimes... you will get a reddit link (i'm not kidding).
I didn't have enough time to test it, because it's more pricey without quantization.
<br>
I trust @mradermacher to make the quants version of this model. (Thank you so much for making those GGUF on my models ^_^)
And... yeah... Your feedbacks are always welcome. Let me know what's your experience using this model, that would be really appreciated.
<br>
Take care everyone.
### Merge Method
This model was merged from the following models using the `passthrough` merge method:
* [RozGrov/NemoDori-v0.2-12B-MN-BT](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT)
## 🧩 Configuration
```yaml
slices:
- sources:
- model: RozGrov/NemoDori-v0.2-12B-MN-BT
layer_range: [0, 8]
- sources:
- model: RozGrov/NemoDori-v0.2-12B-MN-BT
layer_range: [8, 24]
parameters:
scale:
- filter: q_proj
value: 0.919
- filter: k_proj
value: 0.919
- value: 1.0
- sources:
- model: RozGrov/NemoDori-v0.2-12B-MN-BT
layer_range: [16, 24]
parameters:
scale:
- filter: q_proj
value: 0.7
- filter: k_proj
value: 0.7
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: RozGrov/NemoDori-v0.2-12B-MN-BT
layer_range: [16, 32]
parameters:
scale:
- filter: q_proj
value: 0.919
- filter: k_proj
value: 0.919
- value: 1.0
- sources:
- model: RozGrov/NemoDori-v0.2-12B-MN-BT
layer_range: [32, 40]
merge_method: passthrough
dtype: bfloat16
```
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "RozGrov/NemoDori-v0.2-Frankend.2-pre"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```