LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models! https://github.com/SimpleBerry/LLaMA-O1/
What will happen when you compound MCTS β€ LLM β€ Self-Play β€RLHF? Just a little bite of strawberry!π