kaleinaNyan's picture
typo
892d42c verified
metadata
license: apache-2.0
language:
  - en
  - ru
base_model:
  - Qwen/Qwen2.5-7B-Instruct

Description

Eule is my attempt at reproducing OpenAI's o1 series of reasoning models. At the moment the point is not to hit good scores on benchmarks (this model is rather stupid), but to introduce a qualitative change in how the LLM approaches tasks. Similar to o1 models, Eule approaches its proplems in a step-by-step manner. It is also trained to reason in Russian (ultimately, I want to make a decent russian reasoning model).

Cool things I found while playing with it:

  1. It tries to verify its solutions to make sure they are correct.
  2. When failed, sometimes it tries to reiterate on the problem and try a new approach or fix the mistake.

Bad things:

  1. It is stupid. Not smarter than the instruct it is based on (not a strict statement, I didn't run any benchmarks yet). Although it's interesting to inspect its chains of thought.
  2. The final response (after the reasoning chain) is in English.
  3. Sometimes the model may not produce <|REASONING_END|> which messes up parsing.

Atm it is trained only on math data but it can solve riddles and other problems that require step-by-step reasoning. I'm planning on adding more non-math data and then proceed to RL.

Training Details

It was trained using kolibrify on a single H800 for about 6 hours. Training data consists of math problems with solutions formatted as deliberate reasoning chains. The longest reasoning chain is ~19000 tokens.

The model follows ChatML template, but introduces several new tokens:

  • <|REASONING_START|> - start of a reasoning chain.
  • <|REASONING_END|> - end of a reasoning chain.
  • <|RSS|> - start of a reasoning step.
  • <|RSE|> - end of a reasoning step.

A typical conversation formatting structure is as follows:

<|im_start|>system
System message<|im_end|>
<|im_start|>user
Problem description<|im_end|>
<|im_start|>assistant
<|REASONING_START|><|RSS|>step 1<|RSE|><|RSS|>step 2<|RSE|><|REASONING_END|>Final assistant response<|im_end|>

How to Get Started with the Model

I use unsloth and recommend you do the same:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name='kaleinaNyan/eule-qwen2.5instruct-7b-111224'
)
FastLanguageModel.for_inference(model)

def generate(chat, n_tokens, use_cache=True, do_sample=False):
    input_str = tokenizer.apply_chat_template(chat, tokenize=False) + '<|im_start|>assistant\n<|REASONING_START|><|RSS|>'
    inputs = tokenizer(input_str, return_tensors='pt')
    outputs = model.generate(input_ids = inputs['input_ids'], max_new_tokens = n_tokens, use_cache = use_cache, do_sample=do_sample, temperature=0.7)
    return tokenizer.batch_decode(outputs)[0]

msg = "Come up with a qubic equation and solve it"
system_message = "You are an AI assistant that thoroughly solves any task. Explore various routes and verify your solutions. Reason in Russian. Provide concise responses to the user."

chat = [
    {'role': 'system', 'content': system_message},
    {'role': 'user', 'content': msg},
]
response = generate(chat, 8196, do_sample=True)
print('\n'.join(response.split('<|REASONING_START|>')[-1].split('<|RSS|>')))

Evaluation

I'll provide it later for MATH and GSM8K benchmarks.