wongyukim's picture

wongyukim

wongyukim

·

kimwongyuda

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Towards Visual Text Grounding of Multimodal Large Language Model

upvoted a paper 3 days ago

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

upvoted a paper 3 days ago

MM-IFEngine: Towards Multimodal Instruction Following

View all activity

Organizations

None yet

wongyukim's activity

upvoted 8 papers 3 days ago

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published 7 days ago • 9

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 4 days ago • 18

MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published 4 days ago • 30

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published 4 days ago • 42

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 13 days ago • 71

Kimi-VL Technical Report

Paper • 2504.07491 • Published 5 days ago • 104

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 7 days ago • 69

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published 5 days ago • 9

upvoted 2 papers 5 days ago

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Paper • 2504.05599 • Published 7 days ago • 77

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 7 days ago • 158

upvoted 4 papers 6 days ago

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published 11 days ago • 12

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published 8 days ago • 37

Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published 10 days ago • 72

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 7 days ago • 92

upvoted 3 papers 7 days ago

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published 13 days ago • 8

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Paper • 2504.03641 • Published 10 days ago • 13

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Paper • 2504.02605 • Published 11 days ago • 43

upvoted 2 papers 10 days ago

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Paper • 2504.00502 • Published 14 days ago • 21

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published 11 days ago • 54

upvoted a paper 11 days ago

PaperBench: Evaluating AI's Ability to Replicate AI Research

Paper • 2504.01848 • Published 12 days ago • 34