🧠Smol-reason2.1-base🧠

This is my third GRPO reasoning model, I was exploring fine tuning on my own hardware, and found it to work with 3B models.

This is the base version of Smol-reason2.1, meaning it is not suited for Conversations, but instead for zero-shot question answering.

System prompt:

You are a reasoning LLM named Smol-reason2.1, developed by Sweaterdog. Respond in the following format:
<think>

...reason in long recursive loops here...

</think>

...answer here... 

Start your response with <think>

And in accordance to the output format, the model responds like this:

<think>

Okay, lets break down the users issue.

...more reasoning...

Therefore x should be the answer
</think>

X is the answer because...

Features

Flexible reasoning

You can modify the system prompt to change the way the model reasons, by default, it is told to reason about code snippets, which I found works best for everything.

Logical reasoning

This is the first model I have seen which can answer "The Mango Puzzle", which goes like this:

If I give you 15 mangoes, and then you give 14 away, then recieve 60 more mangoes, how many mangoes did you not sell?

The correct answer is 75 Mangoes, most LLMs take "Give Away" as a form of sale, so they typically say 61 Mangoes

Code reasoning

This model is capable of thinking about how to design complex code problems before tackling the entire file.

Mathematical reasoning

This model is capable of breaking down math equations, and checking its own work before responding with an answer.

Medical reasoning

This model is capable of taking in symptoms of a disease, as well as the patients condition, and properly prescribing a diagnosis.

Design

This model was trained off of Qwen2.5 3B and trained on a dataset I put together comprised of Coding, Healthcare, and Math

To be specific, this model was trained off of Smol-reason2, for longer and on a larger dataset of reasoning data from DeepSeek-R1

This model has RoPE scaling up to 65536, and the Q8_0 model can fit on a single 8GB card with the full context length.

Sweaterdog
/

Smol-reason2.1-base