--- base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2 - trl - grpo license: apache-2.0 language: - en datasets: - open-r1/OpenR1-Math-220k --- This is my experiment with training a reasoning model using TRL's GRPO and Unsloth API. # Inference: ## Using Unsloth API (For Faster Inference): ``` import torch from unsloth import FastLanguageModel from transformers import TextStreamer model, tokenizer = FastLanguageModel.from_pretrained( model_name = "ubermenchh/Qwen2.5-3B-open-r1-math", max_seq_length = 1024, dtype = torch.bfloat16, load_in_4bit = True, ) FastLanguageModel.for_inference(model) SYSTEM_PROMPT = """ Respond in the following format: ... ... """ test_question = """ Let $z \in \mathbf{C}$, satisfying the condition $a z^{n}+b \mathrm{i} z^{n-1}+b \mathrm{i} z-a=0, a, b \in \mathbf{R}, m \in$ $\mathbf{N}$, find $|z|$. """ messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": test_question}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt = True, return_tensors = "pt", ).to("cuda") text_streamer = TextStreamer(tokenizer, skip_prompt = True) _ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 2048, pad_token_id = tokenizer.eos_token_id) ``` ## Using Transformers API: ``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "ubermenchh/Qwen2.5-3B-open-r1-math", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "ubermenchh/Qwen2.5-3B-open-r1-math", trust_remote_code=True ) SYSTEM_PROMPT = """ Respond in the following format: ... ... """ problem = "Let $z \in \mathbf{C}$, satisfying the condition $a z^{n}+b \mathrm{i} z^{n-1}+b \mathrm{i} z-a=0, a, b \in \mathbf{R}, m \in$ $\mathbf{N}$, find $|z|$." prompt = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": problem} ] input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True) inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=3000, temperature=1.3, num_return_sequences=1, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Question:\n", problem) print("\n\nResponse:\n", response) ``` ## References: - [https://github.com/HarleyCoops/smolThinker-.5B](https://github.com/HarleyCoops/smolThinker-.5B) - [https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb) - [https://github.com/huggingface/open-r1](https://github.com/huggingface/open-r1) # Uploaded model - **Developed by:** ubermenchh - **License:** apache-2.0 - **Finetuned from model :** unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)