3 16 5

winston_ge

W1nst0nGe

AI & ML interests

None yet

Recent Activity

new activity 4 days ago

General-Level/General-Bench-Closeset:create image folder

new activity 4 days ago

General-Level/General-Bench-Openset:create image folder

new activity 4 days ago

General-Level/General-Bench-Openset:create image folder

View all activity

Organizations

W1nst0nGe's activity

New activity in General-Level/General-Bench-Closeset 4 days ago

create image folder

#2 opened 4 days ago by

W1nst0nGe

New activity in General-Level/General-Bench-Openset 4 days ago

create image folder

#7 opened 4 days ago by

W1nst0nGe

create image folder

#6 opened 4 days ago by

W1nst0nGe

liked a Space 5 months ago

UGround

📱

upvoted a paper 6 months ago

Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published Oct 17, 2024 • 32

upvoted a paper 7 months ago

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 47

upvoted a paper 8 months ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 101

upvoted a paper 9 months ago

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3, 2024 • 32

upvoted 2 papers 10 months ago

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 23

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 64

liked a dataset 10 months ago

shuaishuaicdp/GUI-World

Preview • Updated 26 days ago • 2.8k • 26

upvoted 8 papers about 1 year ago

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Paper • 2312.11370 • Published Dec 18, 2023 • 20

Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Paper • 2403.07750 • Published Mar 12, 2024 • 24

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Paper • 2403.02709 • Published Mar 5, 2024 • 9

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 24

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Paper • 2402.08017 • Published Feb 12, 2024 • 28

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

Paper • 2308.04152 • Published Aug 8, 2023 • 2

Question Aware Vision Transformer for Multimodal Reasoning

Paper • 2402.05472 • Published Feb 8, 2024 • 10