Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
mphielipp 's Collections
RL for Autoregressive Tasks
CUDA Optimization
Real2Sim2Real
LLM Training
Light TTS models
Datasets for Robotic Learning
Diffusion and RL
VLM
Visual Reasoning and LLMs
Diffusion Transformers
Robot Learning
Conditional Diffusion
SSMs and Diffusion
Grokking
Self Pedicting Learning in RL
LLMs Evaluation
CV
VLA

VLM

updated Feb 20
Upvote
-

  • Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

    Paper • 2501.07888 • Published Jan 14 • 16

  • SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

    Paper • 2502.13143 • Published Feb 18 • 31
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs