InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 7 days ago • 232
H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection Paper • 2210.06742 • Published Oct 13, 2022 • 1
Self-supervised Character-to-Character Distillation for Text Recognition Paper • 2211.00288 • Published Nov 1, 2022
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Paper • 2410.08202 • Published Oct 10, 2024 • 4
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data Paper • 2411.18624 • Published Nov 27, 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft Paper • 2312.09238 • Published Dec 14, 2023
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published Jan 14 • 7
A Simple Aerial Detection Baseline of Multimodal Language Models Paper • 2501.09720 • Published Jan 16 • 1
PointOBB: Learning Oriented Object Detection via Single Point Supervision Paper • 2311.14757 • Published Nov 23, 2023
H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection Paper • 2304.04403 • Published Apr 10, 2023
ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection Paper • 2303.04989 • Published Mar 9, 2023
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision Paper • 2311.14758 • Published Nov 23, 2023
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning Paper • 2503.07588 • Published Mar 10 • 7
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery Paper • 2406.09410 • Published Jun 13, 2024
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World Paper • 2503.16399 • Published Mar 20
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published 18 days ago • 67
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published 18 days ago • 67