HuatuoGPT-o1-7B-exl2

Original model: HuatuoGPT-o1-7B made by FreedomIntelligence
Based on: Qwen2.5-7B-Instruct by Qwen

Quants

4bpw h6 (main)
4.5bpw h6
5bpw h6
6bpw h6
8bpw h8

Quantization notes

Made with Exllamav2 0.2.7 with default dataset.
Exl2 quants require Nvidia RTX on Windows or Nvidia RTX/AMD ROCm on Linux.
Model has to fully fit GPU as RAM offloading isn't supported natively.
It can be used with apps such as TabbyAPI, Text-Generation-WebUI, LoLLMs and others.

Original model card

HuatuoGPT-o1-7B

GitHub | Paper

Introduction

HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response.

For more information, visit our GitHub repository: https://github.com/FreedomIntelligence/HuatuoGPT-o1.

Model Info

	Backbone	Supported Languages	Link
HuatuoGPT-o1-8B	LLaMA-3.1-8B	English	HF Link
HuatuoGPT-o1-70B	LLaMA-3.1-70B	English	HF Link
HuatuoGPT-o1-7B	Qwen2.5-7B	English & Chinese	HF Link
HuatuoGPT-o1-72B	Qwen2.5-72B	English & Chinese	HF Link

Usage

You can use HuatuoGPT-o1-7B in the same way as Qwen2.5-7B-Instruct. You can deploy it with tools like vllm or Sglang, or perform direct inference:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-7B",torch_dtype="auto",device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-7B")

input_text = "How to stop a cough?"
messages = [{"role": "user", "content": input_text}]

inputs = tokenizer(tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True
), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

HuatuoGPT-o1 adopts a thinks-before-it-answers approach, with outputs formatted as:

## Thinking
[Reasoning process]

## Final Response
[Output]

📖 Citation

@misc{chen2024huatuogpto1medicalcomplexreasoning,
      title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs}, 
      author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
      year={2024},
      eprint={2412.18925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.18925}, 
}

cgus
/

HuatuoGPT-o1-7B-exl2