|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Satori-reasoning/Satori_FT_data |
|
- Satori-reasoning/Satori_RL_data |
|
base_model: Satori-reasoning/Satori-7B-Round2 |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Satori-7B-Round2-Q4_K_M-GGUF |
|
This model was converted to GGUF format from [`Satori-reasoning/Satori-7B-Round2`](https://huggingface.co/Satori-reasoning/Satori-7B-Round2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/Satori-reasoning/Satori-7B-Round2) for more details on the model. |
|
|
|
--- |
|
Satori-7B-Round2 is a 7B LLM trained on open-source model (Qwen-2.5-Math-7B) and open-source data (OpenMathInstruct-2 and NuminaMath). Satori-7B-Round2 is capable of autoregressive search, i.e., self-reflection and self-exploration without external guidance. |
|
This is achieved through our proposed Chain-of-Action-Thought (COAT) reasoning and a two-stage post-training paradigm. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Our Approach |
|
|
|
|
|
|
|
|
|
We formulate LLM reasoning as a sequential decision-making problem, |
|
where reasoning is a process of constructing and refining an answer step |
|
by step. Specifically, the LLM (agent's policy) starts with an input |
|
context (initial state), generates a reasoning step (action), and |
|
updates the context (next state). The LLM repeats this process until it |
|
reaches a final answer, and receives a reward that evaluates whether the |
|
final answer matches the ground truth. With this formulation, we could |
|
train the LLM to reason using RL, aiming to generate a sequence of |
|
reasoning steps that maximize the expected reward. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chain-of-Action-Thought reasoning (COAT) |
|
|
|
|
|
|
|
|
|
The key challenge of achieving autoregressive search is enabling the |
|
LLM to determine when to reflect, continue, or explore alternative |
|
solutions without external intervention. |
|
To enable this, we introduce several special meta-action tokens that |
|
guide the LLM's reasoning process, |
|
|
|
|
|
Continue Reasoning (<|continue|>): encourages |
|
the LLM to build upon its current reasoning trajectory by generating |
|
the next intermediate step. |
|
Reflect (<|reflect|>): prompts the model to pause and verify the correctness of prior reasoning steps. |
|
Explore Alternative Solution (<|explore|>): signals the model to identify critical flaws in its reasoning and explore a new solution. |
|
|
|
|
|
We refer to this formulation as Chain-of-Action-Thought (COAT) |
|
reasoning. Each COAT reasoning step is a sequence of tokens, starting |
|
with one of the meta-action tokens. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Satori-7B-Round2-Q4_K_M-GGUF --hf-file satori-7b-round2-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Satori-7B-Round2-Q4_K_M-GGUF --hf-file satori-7b-round2-q4_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Satori-7B-Round2-Q4_K_M-GGUF --hf-file satori-7b-round2-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Satori-7B-Round2-Q4_K_M-GGUF --hf-file satori-7b-round2-q4_k_m.gguf -c 2048 |
|
``` |
|
|