11 6 200

Felix Fischer

FlipTip

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with 🔥 about 1 month ago

Introducing 𝗼𝗽𝗲𝗻 𝗗𝗲𝗲𝗽-𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 by Hugging Face! 💥 OpenAI's latest agentic app Deep Research seems really good... But it's closed, as usual. ⏱️ So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research! ⏱️ ➡️ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data... We aimed for the best performance: are the agent's answers really rigorous? On GAIA benchmark, Deep Research had 67% accuracy on the validation set. ➡️ open Deep Research is at 55% (powered by o1), it is: - the best pass@1 solution submitted - the best open solution 💪💪 And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top ! Read the blog post 👉 https://huggingface.co/blog/open-deep-research

reacted to m-ric's post with 🔥 about 1 month ago

Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! 🤯 Do we really need o1's huge RL procedure to see reasoning emerge? It seems not. Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT —no huge datasets or RL procedures needed. Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches. ⚡ The Less-is-More Reasoning Hypothesis: ‣ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity ‣ Pre-training knowledge plus sufficient computational resources at inference levels up math skills ➡️ Core techniques: ‣ High-quality reasoning chains with self-verification steps ‣ 817 handpicked problems that encourage deeper reasoning ‣ Enough inference-time computation to allow extended reasoning 💪 Efficiency gains: ‣ Only 817 examples instead of 100k+ ‣ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers 🚀 Read the full paper here 👉 https://huggingface.co/papers/2502.03387

reacted to m-ric's post with 👍 about 1 month ago

View all activity

Organizations

None yet

FlipTip's activity

liked a model about 2 months ago

NousResearch/DeepHermes-3-Llama-3-8B-Preview

Text Generation • Updated 26 days ago • 71.1k • 304

liked a model 5 months ago

shuttleai/shuttle-3-diffusion

Text-to-Image • Updated Nov 23, 2024 • 22.5k • 193

liked a model 6 months ago

nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Jan 14 • 15.8k • 768

liked a Space 6 months ago

1.9k

PuLID-FLUX

🤗

Generate images from text prompts with a specific style

liked a model 6 months ago

rain1011/pyramid-flow-sd3

Text-to-Video • Updated Oct 30, 2024 • 824

liked a model 7 months ago

HuggingFaceM4/Idefics3-8B-Llama3

Image-Text-to-Text • Updated Dec 2, 2024 • 52.3k • 273

liked a Space 7 months ago

100

Idefics3

📊

Generate text based on an image and prompt

liked a model 7 months ago

upstage/solar-pro-preview-instruct

Text Generation • Updated Sep 20, 2024 • 7.58k • 446

liked 4 models 8 months ago

liked a model 9 months ago

mistralai/Mistral-Nemo-Instruct-2407

Text Generation • Updated Nov 6, 2024 • 188k • • 1.51k

liked a Space 9 months ago

Mantis

👁

Multimodal Language Model

liked 2 models 9 months ago

openchat/openchat-3.6-8b-20240522

Text Generation • Updated May 28, 2024 • 5.03k • 153

fal/AuraSR

Updated Jul 15, 2024 • 512 • 305

liked 3 models 11 months ago

0ai/0ai-7B-v5

Text Generation • Updated May 3, 2024 • 5

openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • Updated Jan 15 • 29.8k • 1.39k

microsoft/Phi-3-small-128k-instruct

Text Generation • Updated Sep 12, 2024 • 7.37k • 176

liked a Space 11 months ago

195

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation