Samuel L Meyers's picture

Samuel L Meyers PRO

MrOvkill

AI & ML interests

Dialogue Generation, Text Generation, etc...

Recent Activity

Organizations

Digital Clockwork's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture None yet's profile picture

MrOvkill's activity

reacted to luigi12345's post with πŸ”₯ 4 days ago
view post
Post
1776
πŸš€ OpenAI o3-mini Just Dropped – Here’s What You Need to Know!

OpenAI just launched o3-mini, a faster, smarter upgrade over o1-mini. It’s better at math, coding, and logic, making it more reliable for structured tasks. Now available in ChatGPT & API, with function calling, structured outputs, and system messages.

πŸ”₯ Why does this matter?
βœ… Stronger in logic, coding, and structured reasoning
βœ… Function calling now works reliably for API responses
βœ… More stable & efficient for production tasks
βœ… Faster responses with better accuracy

⚠️ Who should use it?
βœ”οΈ Great for coding, API calls, and structured Q&A
❌ Not meant for long conversations or complex reasoning (GPT-4 is better)

πŸ’‘ Free users: Try it under β€œReason” mode in ChatGPT
πŸ’‘ Plus/Team users: Daily message limit tripled to 150/day!
  • 1 reply
Β·
reacted to MoritzLaurer's post with ❀️ 25 days ago
view post
Post
1708
The TRL v0.13 release is πŸ”₯! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

πŸ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

πŸ› οΈ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

βš–οΈ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here πŸ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)