Lijuan Wang's picture

3 1

Lijuan Wang

Lijuan

·

https://www.microsoft.com/en-us/research/people/lijuanw/

AI & ML interests

AGI

Recent Activity

authored a paper 15 days ago

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

authored a paper 8 months ago

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

authored a paper 10 months ago

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

View all activity

Organizations

Lijuan's activity

authored a paper 15 days ago

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Paper • 2503.20198 • Published 16 days ago • 4

authored a paper 8 months ago

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 14

authored 2 papers 10 months ago

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 29

authored a paper 12 months ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18

authored a paper about 1 year ago

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Paper • 2401.17093 • Published Jan 30, 2024 • 21

authored 9 papers over 1 year ago

Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 15

Segment and Caption Anything

Paper • 2312.00869 • Published Dec 1, 2023 • 21

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 14

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 18

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 41

ORES: Open-vocabulary Responsible Visual Synthesis

Paper • 2308.13785 • Published Aug 26, 2023 • 7

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Paper • 2308.02490 • Published Aug 4, 2023 • 17

liked a Space about 2 years ago

mm-react