VLM - a mphielipp Collection

mphielipp 's Collections

RL for Autoregressive Tasks

CUDA Optimization

Light TTS models

Datasets for Robotic Learning

Diffusion and RL

VLM

Visual Reasoning and LLMs

Diffusion Transformers

Conditional Diffusion

SSMs and Diffusion

Self Pedicting Learning in RL

LLMs Evaluation

CV

VLA

VLM

updated Feb 20

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14 • 16
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 31