HanSaem Kim's picture

94 13

HanSaem Kim

kensaem

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

upvoted a paper about 5 hours ago

Kimi-VL Technical Report

upvoted a paper about 5 hours ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

View all activity

Organizations

None yet

kensaem's activity

upvoted 6 papers about 5 hours ago

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

Paper • 2504.05579 • Published 6 days ago • 4

Kimi-VL Technical Report

Paper • 2504.07491 • Published 4 days ago • 99

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published 6 days ago • 14

OmniCaptioner: One Captioner to Rule Them All

Paper • 2504.07089 • Published 5 days ago • 16

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Paper • 2504.06232 • Published 6 days ago • 9

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Paper • 2504.02160 • Published 11 days ago • 32

upvoted a collection about 6 hours ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

11 items • Updated Feb 25 • 79

upvoted 2 papers about 6 hours ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published 6 days ago • 59

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published 6 days ago • 135

upvoted a paper 6 days ago

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

Paper • 2504.02949 • Published 11 days ago • 18

upvoted 5 papers 10 days ago

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published 11 days ago • 39

SkyReels-A2: Compose Anything in Video Diffusion Transformers

Paper • 2504.02436 • Published 11 days ago • 35

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published 11 days ago • 54

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Paper • 2503.18886 • Published 21 days ago • 20

Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback

Paper • 2405.20216 • Published May 30, 2024 • 20

upvoted 3 papers 14 days ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 19 days ago • 134

Gemma 3 Technical Report

Paper • 2503.19786 • Published 20 days ago • 45

Wan: Open and Advanced Large-Scale Video Generative Models

Paper • 2503.20314 • Published 19 days ago • 48

upvoted a paper 25 days ago

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Paper • 2503.14151 • Published 27 days ago • 10

upvoted a paper 26 days ago

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Paper • 2503.08677 • Published Mar 11 • 27