--- base_model: - Qwen/Qwen2.5-7B-Instruct-1M - Sakalti/SJT-7B-1M - Triangle104/Q2.5-Instruct-1M_Harmony - bunnycore/Qwen2.5-7B-RRP-1M - huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated library_name: transformers tags: - mergekit - merge license: mit --- # ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M **ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M** is a custom merged language model based on **Qwen2.5-7B** with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to **1 million token** context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions. Quants are availble in GGUF format, provided by [mradermacher](https://huggingface.co/mradermacher). 1. [GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-GGUF) 2. [imatrix GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-i1-GGUF) --- ## 🔧 **Model Details** - **Base Model**: `Qwen/Qwen2.5-7B-Instruct-1M` - **Models Used in Merge**: - `Qwen/Qwen2.5-7B-Instruct-1M` - `bunnycore/Qwen2.5-7B-RRP-1M` - `Triangle104/Q2.5-Instruct-1M_Harmony` - `Sakalti/SJT-7B-1M` - `huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated` - **Merge Method**: `MODEL_STOCK` (Optimized layer-wise weight averaging) --- ## 📖 **Overview** **Qwen2.5-7B-CelestialHarmony-1M** enhances the **Qwen2.5-7B series** with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for: - **Roleplaying** 🧝‍♂️: Immersive character-based storytelling with deep contextual awareness. - **Reasoning & Thought Processing** 🧠: Capable of structured logical thinking, especially when prompted with `` tags. - **Ultra-Long Context Handling** 📜: Efficient processing of sequences up to **1,010,000 tokens** using optimized sparse attention. --- ## ⚙️ **Technical Specifications** | Specification | Value | |--------------|---------| | **Model Type** | Causal Language Model | | **Parameters** | 7.61B | | **Non-Embedding Parameters** | 6.53B | | **Layers** | 28 | | **Attention Heads (GQA)** | 28 (Q), 4 (KV) | | **Max Context Length** | 1,010,000 tokens | | **Max Generation Length** | 8,192 tokens | | **Merge Method** | Model Stock| --- ## 🔬 **Merging Details** This model was merged using the **Model Stock** method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model. ### **Merge YAML Configuration** ```yaml base_model: Qwen/Qwen2.5-7B-Instruct-1M dtype: bfloat16 merge_method: model_stock models: - model: Qwen/Qwen2.5-7B-Instruct-1M - model: Triangle104/Q2.5-Instruct-1M_Harmony - model: Sakalti/SJT-7B-1M - model: bunnycore/Qwen2.5-7B-RRP-1M - model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M ``` --- ## 🚀 **Quickstart** ### **Install Required Packages** Ensure you have the latest `transformers` library installed: ```bash pip install transformers torch accelerate ``` ### **Load and Use the Model** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Tell me a short story about an ancient celestial warrior." messages = [ {"role": "system", "content": "You are a wise celestial storyteller."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=512) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` --- ## ⚡ **Optimized Deployment with vLLM** For long-context inference, use **vLLM**: ```bash git clone -b dev/dual-chunk-attn git@github.com:QwenLM/vllm.git cd vllm pip install -e . -v ``` Run the model: ```bash vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \ --tensor-parallel-size 4 \ --max-model-len 1010000 \ --enable-chunked-prefill --max-num-batched-tokens 131072 \ --enforce-eager \ --max-num-seqs 1 ``` --- ## 🎯 **Model Capabilities** ✅ **Roleplay & Storytelling** – Designed for engaging interactions. ✅ **Long-Context Awareness** – Handles texts up to **1M tokens**. ✅ **Logical Thinking & Reasoning** – Supports `` tag to enhance thought structuring. ✅ **Optimized Merge Strategy** – Uses `Model Stock` for superior generalization. --- ## 📜 **Acknowledgments** This model is built on top of **Qwen2.5-7B**, with contributions from **bunnycore, Triangle104, and Sakalti**, leveraging the **Model Stock** merging methodology. For further details, see: - 📄 [Qwen2.5-7B Technical Report](https://arxiv.org/abs/2501.15383) - 📖 [MergeKit Documentation](https://github.com/mlfoundations/mergekit) - 🚀 [vLLM for Long-Context Inference](https://github.com/QwenLM/vllm) ---