`Qwen2.5 Bakeneko 32B Instruct (rinna/qwen2.5-bakeneko-32b-instruct)`

Overview

This model is an instruction-tuned variant of rinna/qwen2.5-bakeneko-32b, fine-tuned using Chat Vector and Simple Preference Optimization (SimPO). It adheres to the Qwen2.5 chat format and is designed to deliever superior performance in Japanese language tasks.

Model Type	Model Name
Japanese Continual Pre-Training Model	Qwen2.5 Bakeneko 32B [HF]
Instruction-Tuning Model	Qwen2.5 Bakeneko 32B Instruct [HF][AWQ][GGUF][GPTQ int8][GPTQ int4]
DeepSeek R1 Distill Qwen2.5 Merged Reasoning Model	DeepSeek R1 Distill Qwen2.5 Bakeneko 32B [HF][AWQ][GGUF][GPTQ int8][GPTQ int4]
QwQ Merged Reasoning Model	QwQ Bakeneko 32B [HF][AWQ][GGUF][GPTQ int8][GPTQ int4]
QwQ Bakeneko Merged Instruction-Tuning Model	Qwen2.5 Bakeneko 32B Instruct V2 [HF][AWQ][GGUF][GPTQ int8][GPTQ int4]

Model architecture

A 64-layer, 5120-hidden-size transformer-based language model. For a comprehensive understanding of the architecture, please refer to the Qwen2.5 Technical Report.
Training

This model was developed through a multi-stage training process:

Model merging. The base model was augmented with instruction-following capabilities through a Chat Vector addition process. The Chat Vector was derived by subtracting the parameter vectors of Qwen/Qwen2.5-32B-Instruct from Qwen/Qwen2.5-32B, as follows.
```
  rinna/qwen2.5-bakeneko-32b + 1.0 * (Qwen/Qwen2.5-32B-Instruct - Qwen/Qwen2.5-32B)
```
During this process, the embedding layer was omitted when performing the subtraction and addition of parameter vectors.

SimPO was applied using a subset of the following dataset to further refine the performance of the merged model.
- rinna's internal dataset
Contributors
Release date

February 13, 2025

Benchmarking

Model	Japanese LM Evaluation Harness	Japanese MT-Bench (first turn)	Japanese MT-Bench (multi turn)
Qwen/Qwen2.5-32B	79.46	-	-
rinna/qwen2.5-bakeneko-32b	79.18	-	-
Qwen/Qwen2.5-32B-Instruct	78.29	8.13	7.54
rinna/qwen2.5-bakeneko-32b-instruct	79.62	8.17	7.66
rinna/qwen2.5-bakeneko-32b-instruct-v2	77.92	8.86	8.53
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	73.51	7.39	6.88
rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b	77.43	8.58	8.19
Qwen/QwQ-32B	76.12	8.58	8.25
rinna/qwq-bakeneko-32b	78.31	8.81	8.52

For detailed benchmarking results, please refer to rinna's LM benchmark page (Sheet 20250213).

How to use the model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "rinna/qwen2.5-bakeneko-32b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"},
    {"role": "user", "content": "ゲーム・小説・アニメに登場するアイテムボックスの特徴と、その原理を詳細に推測してください。"},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
input_ids = tokenizer.encode(
    prompt,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.7,
    top_k=20,
    top_p=0.8,
    repetition_penalty=1.05,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Tokenization

This model inherits the original Qwen/Qwen2.5-32B-Instruct tokenizer.

How to cite

@misc{rinna-qwen2.5-bakeneko-32b-instruct,
    title = {rinna/qwen2.5-bakeneko-32b-instruct},
    author = {Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@misc{qwen2.5,
    title = {Qwen2.5: A Party of Foundation Models},
    url = {https://qwenlm.github.io/blog/qwen2.5/},
    author = {Qwen Team},
    month = {September},
    year = {2024}
}

@article{qwen2,
    title = {Qwen2 Technical Report}, 
    author = {An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
    journal = {arXiv preprint arXiv:2407.10671},
    year = {2024}
}

@article{huang2023chat,
    title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
    author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
    year = {2023},
    url = {https://arxiv.org/abs/2310.04799}
}

@article{meng2024simpo,
    title = {SimPO: Simple Preference Optimization with a Reference-Free Reward},
    author = {Meng, Yu and Xia, Mengzhou and Chen, Danqi},
    journal = {arXiv preprint arXiv:2405.14734},
    year = {2024}
}

License

The Apache License, Version 2.0

rinna
/

qwen2.5-bakeneko-32b-instruct