Update README.md

36dbdb9 verified 7 days ago

5.18 kB

	---
	base_model:
	- Qwen/Qwen2.5-7B-Instruct-1M
	- Sakalti/SJT-7B-1M
	- Triangle104/Q2.5-Instruct-1M_Harmony
	- bunnycore/Qwen2.5-7B-RRP-1M
	- huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: mit
	---
	# ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M

	ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M is a custom merged language model based on Qwen2.5-7B with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to 1 million token context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions.

	Quants are availble in GGUF format, provided by [mradermacher](https://huggingface.co/mradermacher).
	1. [GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-GGUF)
	2. [imatrix GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-i1-GGUF)
	---

	## 🔧 Model Details
	- Base Model: `Qwen/Qwen2.5-7B-Instruct-1M`
	- Models Used in Merge:
	- `Qwen/Qwen2.5-7B-Instruct-1M`
	- `bunnycore/Qwen2.5-7B-RRP-1M`
	- `Triangle104/Q2.5-Instruct-1M_Harmony`
	- `Sakalti/SJT-7B-1M`
	- `huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated`
	- Merge Method: `MODEL_STOCK` (Optimized layer-wise weight averaging)

	---

	## 📖 Overview
	Qwen2.5-7B-CelestialHarmony-1M enhances the Qwen2.5-7B series with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for:
	- Roleplaying 🧝‍♂️: Immersive character-based storytelling with deep contextual awareness.
	- Reasoning & Thought Processing 🧠: Capable of structured logical thinking, especially when prompted with `<think>` tags.
	- Ultra-Long Context Handling 📜: Efficient processing of sequences up to 1,010,000 tokens using optimized sparse attention.

	---

	## ⚙️ Technical Specifications
	\| Specification \| Value \|
	\|--------------\|---------\|
	\| Model Type \| Causal Language Model \|
	\| Parameters \| 7.61B \|
	\| Non-Embedding Parameters \| 6.53B \|
	\| Layers \| 28 \|
	\| Attention Heads (GQA) \| 28 (Q), 4 (KV) \|
	\| Max Context Length \| 1,010,000 tokens \|
	\| Max Generation Length \| 8,192 tokens \|
	\| Merge Method \| Model Stock\|

	---

	## 🔬 Merging Details
	This model was merged using the Model Stock method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model.

	### Merge YAML Configuration
	```yaml
	base_model: Qwen/Qwen2.5-7B-Instruct-1M
	dtype: bfloat16
	merge_method: model_stock
	models:
	- model: Qwen/Qwen2.5-7B-Instruct-1M
	- model: Triangle104/Q2.5-Instruct-1M_Harmony
	- model: Sakalti/SJT-7B-1M
	- model: bunnycore/Qwen2.5-7B-RRP-1M
	- model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
	tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M
	```

	---

	## 🚀 Quickstart
	### Install Required Packages
	Ensure you have the latest `transformers` library installed:
	```bash
	pip install transformers torch accelerate
	```

	### Load and Use the Model
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Tell me a short story about an ancient celestial warrior."
	messages = [
	{"role": "system", "content": "You are a wise celestial storyteller."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(**model_inputs, max_new_tokens=512)
	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(response)
	```

	---

	## ⚡ Optimized Deployment with vLLM
	For long-context inference, use vLLM:
	```bash
	git clone -b dev/dual-chunk-attn [email protected]:QwenLM/vllm.git
	cd vllm
	pip install -e . -v
	```
	Run the model:
	```bash
	vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \
	--tensor-parallel-size 4 \
	--max-model-len 1010000 \
	--enable-chunked-prefill --max-num-batched-tokens 131072 \
	--enforce-eager \
	--max-num-seqs 1
	```

	---

	## 🎯 Model Capabilities
	✅ Roleplay & Storytelling – Designed for engaging interactions.
	✅ Long-Context Awareness – Handles texts up to 1M tokens.
	✅ Logical Thinking & Reasoning – Supports `<think>` tag to enhance thought structuring.
	✅ Optimized Merge Strategy – Uses `Model Stock` for superior generalization.

	---

	## 📜 Acknowledgments
	This model is built on top of Qwen2.5-7B, with contributions from bunnycore, Triangle104, and Sakalti, leveraging the Model Stock merging methodology.

	For further details, see:
	- 📄 [Qwen2.5-7B Technical Report](https://arxiv.org/abs/2501.15383)
	- 📖 [MergeKit Documentation](https://github.com/mlfoundations/mergekit)
	- 🚀 [vLLM for Long-Context Inference](https://github.com/QwenLM/vllm)

	---