image

Daemontatox / QwQ_Zero1 (Unsloth Finetune)

A focused attempt to solve the QwQ-style “reasoning freeze” by finetuning QwQ for fluid, multi-step reasoning — without compromising on depth or capability.

This model is made for creators who want both intelligence and speed in a single interface. Powered by Unsloth and TRL, trained with intention, and built for rapid logical flow.


Model Details

  • Base model: unsloth/qwq-32b-unsloth
  • Developer: Daemontatox
  • License: apache-2.0
  • Architecture: Qwen2.5
  • Precision: float16
  • Libraries: Unsloth, TRL, Transformers
  • Finetuning Goal: Resolve stalling during reasoning while maintaining chain-of-thought fidelity

Training Details

  • Epochs: [5]
  • Batch Size: [16]
  • Sequence Length: [2048]
  • Optimizations: LoRA , gradient checkpointing, Flash Attention 2
  • Hardware Used: 3x L40S
  • Datasets: Custom curated prompts focused on logical progression, question answering, and instruction flow (details withheld)

Usage

This model is best suited for:

  • Fast, chain-of-thought reasoning in multi-turn dialogue or step-by-step prompts
  • Agentic workflows where latency and logical progression are key
  • Creative ideation and brainstorming with minimal hallucination under clear prompts
  • Instruction following, task decomposition, and in-context problem solving

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Daemontatox/QWQ_Zero1" 

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "Explain why the moon affects the tides in a step-by-step way."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use Cases

  • Autonomous AI agents with complex task breakdowns
  • Education platforms requiring detailed, structured explanations
  • LLM pipelines using RAG or retrieval to walk through logical steps
  • Dev tools that scaffold reasoning for planning, coding, or content generation

Limitations

  • Not instruction-tuned – prompt structure significantly affects output quality
  • High memory requirements – 32B model requires ≥48GB GPU or offloading setups
  • Reasoning optimization ≠ factual accuracy – verify claims from output
  • No safety alignment – model may reflect biases from pretraining data
  • Occasional verbosity – especially in open-ended generative contexts

Evaluation

Metric Value
MMLU (Zero-shot) TBD
ARC Challenge TBD
Long-form CoT Tasks TBD
Human Eval TBD

Evaluation in progress — coming soon.


Ethical Considerations

This model has not been aligned for safety, fairness, or bias mitigation. It may:

  • Reflect harmful stereotypes or cultural bias present in pretraining data
  • Produce hallucinated or false information confidently
  • Mislead users if used without human oversight

Do not use this model in:

  • Medical, legal, financial, or psychological contexts
  • Any real-time, high-stakes, or human-critical system

Future Work

  • Instruction tuning for robust multi-task generalization
  • Safety alignment via RLHF or constitutional guidance
  • Integration into open-agent frameworks and interactive environments
  • Adding evaluation dashboards + quantized deployment versions

Citation

If you use this model, please credit:

Daemontatox –QwQ_Zero1 32B
Built with Unsloth, TRL, and deep love for fast reasoning.


try it out

Contact

For questions, feedback, or collaboration inquiries, reach out via:


Downloads last month
6
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daemontatox/QWQ_Zero1

Quantizations
2 models

Dataset used to train Daemontatox/QWQ_Zero1

Space using Daemontatox/QWQ_Zero1 1