Daemontatox / QwQ_Zero1 (Unsloth Finetune)

A focused attempt to solve the QwQ-style “reasoning freeze” by finetuning QwQ for fluid, multi-step reasoning — without compromising on depth or capability.

This model is made for creators who want both intelligence and speed in a single interface. Powered by Unsloth and TRL, trained with intention, and built for rapid logical flow.

Model Details

Base model: unsloth/qwq-32b-unsloth
Developer: Daemontatox
License: apache-2.0
Architecture: Qwen2.5
Precision: float16
Libraries: Unsloth, TRL, Transformers
Finetuning Goal: Resolve stalling during reasoning while maintaining chain-of-thought fidelity

Training Details

Epochs: [5]
Batch Size: [16]
Sequence Length: [2048]
Optimizations: LoRA , gradient checkpointing, Flash Attention 2
Hardware Used: 3x L40S
Datasets: Custom curated prompts focused on logical progression, question answering, and instruction flow (details withheld)

Usage

This model is best suited for:

Fast, chain-of-thought reasoning in multi-turn dialogue or step-by-step prompts
Agentic workflows where latency and logical progression are key
Creative ideation and brainstorming with minimal hallucination under clear prompts
Instruction following, task decomposition, and in-context problem solving

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Daemontatox/QWQ_Zero1" 

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "Explain why the moon affects the tides in a step-by-step way."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use Cases

Autonomous AI agents with complex task breakdowns
Education platforms requiring detailed, structured explanations
LLM pipelines using RAG or retrieval to walk through logical steps
Dev tools that scaffold reasoning for planning, coding, or content generation

Limitations

Not instruction-tuned – prompt structure significantly affects output quality
High memory requirements – 32B model requires ≥48GB GPU or offloading setups
Reasoning optimization ≠ factual accuracy – verify claims from output
No safety alignment – model may reflect biases from pretraining data
Occasional verbosity – especially in open-ended generative contexts

Evaluation

Metric	Value
MMLU (Zero-shot)	TBD
ARC Challenge	TBD
Long-form CoT Tasks	TBD
Human Eval	TBD

Evaluation in progress — coming soon.

Ethical Considerations

This model has not been aligned for safety, fairness, or bias mitigation. It may:

Reflect harmful stereotypes or cultural bias present in pretraining data
Produce hallucinated or false information confidently
Mislead users if used without human oversight

Do not use this model in:

Medical, legal, financial, or psychological contexts
Any real-time, high-stakes, or human-critical system

Future Work

Instruction tuning for robust multi-task generalization
Safety alignment via RLHF or constitutional guidance
Integration into open-agent frameworks and interactive environments
Adding evaluation dashboards + quantized deployment versions

Citation

If you use this model, please credit:

Daemontatox –QwQ_Zero1 32B
Built with Unsloth, TRL, and deep love for fast reasoning.

try it out

Contact

For questions, feedback, or collaboration inquiries, reach out via:

Hugging Face: @Daemontatox
GitHub: github.com/Daemontatox

Daemontatox
/

QWQ_Zero1