Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq
Stefano Fiorucci PRO
anakin87
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data...
Contributing to Haystack LLM framework 🏗️
Recent Activity
upvoted an article about 12 hours ago
ML Intern Takes Our Post-Training Internship Test reacted to theirpost with ❤️ about 13 hours ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe
I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play.
🧑🍳 Here's how:
1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies
Done! Beats GPT-5-mini 🏆
---
🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe
🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe
📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course
🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe posted an update about 13 hours ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe
I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play.
🧑🍳 Here's how:
1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies
Done! Beats GPT-5-mini 🏆
---
🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe
🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe
📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course
🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe