kaleinaNyan
/

eule-qwen2.5instruct-7b-111224

Model card Files Files and versions Community

eule-qwen2.5instruct-7b-111224 / README.md

kaleinaNyan's picture

typo

892d42c verified about 1 month ago

|

history blame contribute delete

3.39 kB

	---
	license: apache-2.0
	language:
	- en
	- ru
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	---

	## Description

	Eule is my attempt at reproducing OpenAI's o1 series of reasoning models. At the moment the point is not to hit good scores on benchmarks (this model is rather stupid),
	but to introduce a qualitative change in how the LLM approaches tasks. Similar to o1 models, Eule approaches its proplems in a step-by-step manner.
	It is also trained to reason in Russian (ultimately, I want to make a decent russian reasoning model).

	Cool things I found while playing with it:
	1. It tries to verify its solutions to make sure they are correct.
	2. When failed, sometimes it tries to reiterate on the problem and try a new approach or fix the mistake.

	Bad things:
	1. It is stupid. Not smarter than the instruct it is based on (not a strict statement, I didn't run any benchmarks yet). Although it's interesting to inspect its chains of thought.
	2. The final response (after the reasoning chain) is in English.
	3. Sometimes the model may not produce <\|REASONING_END\|> which messes up parsing.

	Atm it is trained only on math data but it can solve riddles and other problems that require step-by-step reasoning.
	I'm planning on adding more non-math data and then proceed to RL.

	## Training Details

	It was trained using [kolibrify](https://github.com/oKatanaaa/kolibrify) on a single H800 for about 6 hours.
	Training data consists of math problems with solutions formatted as deliberate reasoning chains. The longest reasoning chain is ~19000 tokens.


	The model follows ChatML template, but introduces several new tokens:
	- `<\|REASONING_START\|>` - start of a reasoning chain.
	- `<\|REASONING_END\|>` - end of a reasoning chain.
	- `<\|RSS\|>` - start of a reasoning step.
	- `<\|RSE\|>` - end of a reasoning step.

	A typical conversation formatting structure is as follows:
	```
	<\|im_start\|>system
	System message<\|im_end\|>
	<\|im_start\|>user
	Problem description<\|im_end\|>
	<\|im_start\|>assistant
	<\|REASONING_START\|><\|RSS\|>step 1<\|RSE\|><\|RSS\|>step 2<\|RSE\|><\|REASONING_END\|>Final assistant response<\|im_end\|>
	```

	## How to Get Started with the Model

	I use [unsloth](https://github.com/unslothai/unsloth) and recommend you do the same:
	```python
	from unsloth import FastLanguageModel

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name='kaleinaNyan/eule-qwen2.5instruct-7b-111224'
	)
	FastLanguageModel.for_inference(model)

	def generate(chat, n_tokens, use_cache=True, do_sample=False):
	input_str = tokenizer.apply_chat_template(chat, tokenize=False) + '<\|im_start\|>assistant\n<\|REASONING_START\|><\|RSS\|>'
	inputs = tokenizer(input_str, return_tensors='pt')
	outputs = model.generate(input_ids = inputs['input_ids'], max_new_tokens = n_tokens, use_cache = use_cache, do_sample=do_sample, temperature=0.7)
	return tokenizer.batch_decode(outputs)[0]

	msg = "Come up with a qubic equation and solve it"
	system_message = "You are an AI assistant that thoroughly solves any task. Explore various routes and verify your solutions. Reason in Russian. Provide concise responses to the user."

	chat = [
	{'role': 'system', 'content': system_message},
	{'role': 'user', 'content': msg},
	]
	response = generate(chat, 8196, do_sample=True)
	print('\n'.join(response.split('<\|REASONING_START\|>')[-1].split('<\|RSS\|>')))
	```

	## Evaluation

	I'll provide it later for MATH and GSM8K benchmarks.