Xin Li's picture

Xin Li PRO

lixin4ever

·

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a model 1 day ago

meta-llama/Llama-4-Scout-17B-16E-Instruct

upvoted a paper 7 days ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

liked a model 8 days ago

SparkAudio/Spark-TTS-0.5B

View all activity

Organizations

lixin4ever's activity

upvoted a paper 7 days ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published 14 days ago • 48

upvoted a paper 9 days ago

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published 12 days ago • 21

upvoted a paper 20 days ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published 21 days ago • 115

upvoted a paper 25 days ago

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published 26 days ago • 21

upvoted a paper about 1 month ago

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

Paper • 2503.02951 • Published Mar 4 • 29

upvoted 2 papers about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 140

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19 • 25

upvoted a collection about 2 months ago

VideoRefer

6 items • Updated 28 days ago • 2

upvoted a paper about 2 months ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22

upvoted a collection about 2 months ago

Ovis2

Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 14 days ago • 59

upvoted a paper about 2 months ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

upvoted a collection 2 months ago

🖼️ MLLM by the Chinese community - 2025

13 items • Updated 11 days ago • 1

upvoted a collection 3 months ago

VideoLLaMA3

Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 28 days ago • 14

upvoted 5 papers 3 months ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 91

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 99

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 48

upvoted 2 collections 4 months ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated 26 days ago • 68

Inf-CL

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated 28 days ago • 3