SimpleBerry Research Lab

university
Activity Feed

AI & ML interests

Smart Solutions for Success in Science & Business.

Recent Activity

SimpleBerry's activity

qq8933ย 
posted an update 9 days ago
qq8933ย 
posted an update 16 days ago
view post
Post
2526
LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.
ยท
qq8933ย 
posted an update 17 days ago
qq8933ย 
updated a Space 25 days ago
qq8933ย 
posted an update 25 days ago
view post
Post
3038
  • 3 replies
ยท
qq8933ย 
posted an update 28 days ago
view post
Post
1343
LLaMA-O1 Base and SFT model will be uploaded to HF today.
RLHF pipeline already ready, still waiting for data sampling.
  • 1 reply
ยท
jwu323ย 
posted an update about 1 month ago
view post
Post
1351
We are excited to announce a new internal project, Rome, focused on advancing LLM reasoning. The code and accompanying paper will be released soon. Stay tuned!
  • 2 replies
ยท
qq8933ย 
posted an update about 2 months ago
view post
Post
2406
Discovered an outrageous bug on the ChatGPT official website, especially for those using ad-blocking plugins. Please make sure to add browser-intake-datadoghq.com to your ad block whitelist. The ChatGPT webpage relies on this site for heartbeat detection, but since it belongs to an ad tracking network, it's included in major ad-blocking lists. (If you're using Clash, also remember to add it to the whitelist.) Failing to do so may cause the ChatGPT web interface to display a greyed-out send button after clicking, with no response.

For users with Chinese IP addresses, consider adding this URL to the rules of your U.S. node, as the response headers from this site will report the user's physical location to GPT.
  • 3 replies
ยท
qq8933ย 
posted an update about 2 months ago
view post
Post
6265
LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS โค LLM โค Self-Play โคRLHF?
Just a little bite of strawberry!๐Ÿ“

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
  • 2 replies
ยท