-
38
Llama 3.2V 11B Cot
💬Generate descriptions and answers by combining text and images
-
Xkev/Llama-3.2V-11B-cot
Image-Text-to-Text • Updated • 5.09k • 142 -
Xkev/LLaVA-CoT-100k
Viewer • Updated • 98.6k • 2.85k • 71 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 113
Guowei Xu PRO
Xkev
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
16 days ago
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
upvoted
a
collection
19 days ago
DeepSeek-R1
upvoted
a
paper
19 days ago
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Organizations
None yet