chansung park's picture

chansung park PRO

chansung

AI & ML interests

None yet

Recent Activity

liked a Space about 7 hours ago
adaptsum/demo
updated a Space about 7 hours ago
adaptsum/demo
published a Space about 7 hours ago
adaptsum/demo
View all activity

Articles

Organizations

Notebooks-explorers's profile picture various keras sd deployment 's profile picture LLMs's profile picture Gradio-Themes-Party's profile picture Hugging Face Fellows's profile picture Alpaca LoRA's profile picture Webhooks Explorers (BETA)'s profile picture Deploy HF TF ViTs's profile picture Blog-explorers's profile picture Personal Coding Assistant's profile picture ZeroGPU Explorers's profile picture Social Post Explorers's profile picture Top Contributors: Dataset Downloads's profile picture llama-duo's profile picture klcsp's profile picture ExpanLLM's profile picture Adaptive Summarization's profile picture

Posts 17

view post
Post
629
Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161
view post
Post
2623
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/