Phys

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

sainx authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

wchai authored a paper about 1 month ago

PAD: Personalized Alignment at Decoding-Time

wchai authored a paper about 1 month ago

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

View all activity

Phys111111's activity

sainx

authored a paper 5 days ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 7 days ago • 22

wchai

authored 2 papers about 1 month ago

PAD: Personalized Alignment at Decoding-Time

Paper • 2410.04070 • Published Oct 5

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Paper • 2411.11922 • Published Nov 18 • 18

wchai

authored a paper 3 months ago

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4 • 4

Jialuo21

updated a dataset 3 months ago

Phys111111/data_buoyancy

Viewer • Updated Sep 23 • 114 • 37

wchai

authored 2 papers 3 months ago

Chasing Consistency in Text-to-3D Generation from a Single Image

Paper • 2309.03599 • Published Sep 7, 2023 • 1

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Paper • 2407.13930 • Published Jul 18

Jialuo21

updated a model 4 months ago

Phys111111/temp

Updated Sep 6

sainx

authored a paper 6 months ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 59

Fiaa

authored 3 papers 6 months ago

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Paper • 2406.07546 • Published Jun 11 • 8

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Paper • 2406.09411 • Published Jun 13 • 18

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Paper • 2406.09403 • Published Jun 13 • 19

sainx

authored a paper 8 months ago

MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12

Fiaa

authored 2 papers 8 months ago

ImagenHub: Standardizing the evaluation of conditional image generation models

Paper • 2310.01596 • Published Oct 2, 2023 • 18

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18 • 24

sainx

authored 2 papers 11 months ago

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25 • 17

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Paper • 2401.08740 • Published Jan 16 • 12

wchai

authored a paper about 1 year ago

See and Think: Embodied Agent in Virtual Environment

Paper • 2311.15209 • Published Nov 26, 2023 • 2

wchai

authored 2 papers over 1 year ago

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Paper • 2308.09592 • Published Aug 18, 2023 • 2

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Paper • 2307.16449 • Published Jul 31, 2023 • 15

AI & ML interests

Recent Activity

Team members 6

Phys111111's activity