Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ datasets:
|
|
27 |
- **License:** llama 3.1
|
28 |
- **Finetuned from model :** Solshine/reflection-llama-3.1-8B-Solshine-trainround4-16bit
|
29 |
|
30 |
-
Inspired by and featuring the Reflection Tuning technique pioneered by Matt Shumer (possibly earlier innovated by the team at Anthropic, and Mlabbone' Hermes.)
|
31 |
|
32 |
*To the authors' knowledge, this is V5 of the first "reflection tuned" Llama 3.1 8B LLM*
|
33 |
|
|
|
27 |
- **License:** llama 3.1
|
28 |
- **Finetuned from model :** Solshine/reflection-llama-3.1-8B-Solshine-trainround4-16bit
|
29 |
|
30 |
+
This model, trained on chain of thoughts within the reinforcement learning, predates OpenAI's o1 model. Inspired by and featuring the Reflection Tuning technique pioneered by Matt Shumer (possibly earlier innovated by the team at Anthropic, and Mlabbone' Hermes.)
|
31 |
|
32 |
*To the authors' knowledge, this is V5 of the first "reflection tuned" Llama 3.1 8B LLM*
|
33 |
|