Music - a ChaangHaan Collection

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 29

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 6

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Paper • 2312.16486 • Published Dec 27, 2023 • 7

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Paper • 2312.16272 • Published Dec 26, 2023 • 7

Prompt Expansion for Adaptive Text-to-Image Generation

Paper • 2312.16720 • Published Dec 27, 2023 • 6

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 58

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 29

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 21

LangSplat: 3D Language Gaussian Splatting

Paper • 2312.16084 • Published Dec 26, 2023 • 16

One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Paper • 2312.16145 • Published Dec 26, 2023 • 10

Supervised Knowledge Makes Large Language Models Better In-context Learners

Paper • 2312.15918 • Published Dec 26, 2023 • 10

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 17

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Paper • 2312.14878 • Published Dec 22, 2023 • 15

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

Paper • 2312.14385 • Published Dec 22, 2023 • 7

Shai: A large language model for asset management

Paper • 2312.14203 • Published Dec 21, 2023 • 6

LLM4VG: Large Language Models Evaluation for Video Grounding

Paper • 2312.14206 • Published Dec 21, 2023 • 4

DreamTuner: Single Image is Enough for Subject-Driven Generation

Paper • 2312.13691 • Published Dec 21, 2023 • 28

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 24

Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 21

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 20

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 17

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Paper • 2312.13789 • Published Dec 21, 2023 • 15

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Paper • 2312.13980 • Published Dec 21, 2023 • 15

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

Paper • 2312.13469 • Published Dec 20, 2023 • 12

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 11

ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors

Paper • 2312.13324 • Published Dec 20, 2023 • 11

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 9

HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs

Paper • 2312.14140 • Published Dec 21, 2023 • 8

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 42

Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Paper • 2312.13252 • Published Dec 20, 2023 • 28

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 18

Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 14

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Paper • 2312.13271 • Published Dec 20, 2023 • 6

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 259

StarVector: Generating Scalable Vector Graphics Code from Images

Paper • 2312.11556 • Published Dec 17, 2023 • 28

3D-LFM: Lifting Foundation Model

Paper • 2312.11894 • Published Dec 19, 2023 • 15

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Paper • 2312.11666 • Published Dec 18, 2023 • 13

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Paper • 2312.12423 • Published Dec 19, 2023 • 13

MixRT: Mixed Neural Representations For Real-Time NeRF Rendering

Paper • 2312.11841 • Published Dec 19, 2023 • 11

Tracking Any Object Amodally

Paper • 2312.12433 • Published Dec 19, 2023 • 12

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

Paper • 2312.11537 • Published Dec 15, 2023 • 7

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

Paper • 2312.11595 • Published Dec 18, 2023 • 6

Text-Conditioned Resampler For Long Form Video Understanding

Paper • 2312.11897 • Published Dec 19, 2023 • 6

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

Paper • 2312.11532 • Published Dec 15, 2023 • 6

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

Paper • 2312.11535 • Published Dec 15, 2023 • 7

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

Paper • 2312.12030 • Published Dec 19, 2023 • 6

VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 22

Rich Human Feedback for Text-to-Image Generation

Paper • 2312.10240 • Published Dec 15, 2023 • 20

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Paper • 2312.11370 • Published Dec 18, 2023 • 20

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

Paper • 2312.10763 • Published Dec 17, 2023 • 19

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Paper • 2312.11461 • Published Dec 18, 2023 • 19

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

Paper • 2312.10899 • Published Dec 18, 2023 • 15

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Paper • 2312.11396 • Published Dec 18, 2023 • 11

Cascade Speculative Drafting for Even Faster LLM Inference

Paper • 2312.11462 • Published Dec 18, 2023 • 9

Silkie: Preference Distillation for Large Visual Language Models

Paper • 2312.10665 • Published Dec 17, 2023 • 11

VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper • 2312.10656 • Published Dec 17, 2023 • 11

ProTIP: Progressive Tool Retrieval Improves Planning

Paper • 2312.10332 • Published Dec 16, 2023 • 8

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

Paper • 2312.10835 • Published Dec 17, 2023 • 7

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Paper • 2312.11459 • Published Dec 18, 2023 • 6

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Paper • 2312.11458 • Published Dec 18, 2023 • 5

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 40

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27

MobileSAMv2: Faster Segment Anything to Everything

Paper • 2312.09579 • Published Dec 15, 2023 • 24

Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 20

Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 19

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 16

Self-Evaluation Improves Selective Generation in Large Language Models

Paper • 2312.09300 • Published Dec 14, 2023 • 16

Stable Score Distillation for High-Quality 3D Generation

Paper • 2312.09305 • Published Dec 14, 2023 • 10

Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 9

StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 48

TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 39

CogAgent: A Visual Language Model for GUI Agents

Paper • 2312.08914 • Published Dec 14, 2023 • 31

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 24

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20

Pixel Aligned Language Models

Paper • 2312.09237 • Published Dec 14, 2023 • 18

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

Paper • 2312.08889 • Published Dec 13, 2023 • 15

Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 13

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Paper • 2312.09067 • Published Dec 14, 2023 • 16

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 12

General Object Foundation Model for Images and Videos at Scale

Paper • 2312.09158 • Published Dec 14, 2023 • 12

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Paper • 2312.08754 • Published Dec 14, 2023 • 11

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Paper • 2312.09251 • Published Dec 14, 2023 • 10

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

Paper • 2312.09246 • Published Dec 14, 2023 • 9

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 28

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Paper • 2312.07661 • Published Dec 12, 2023 • 19

Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 18

Invariant Graph Transformer

Paper • 2312.07859 • Published Dec 13, 2023 • 10

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Paper • 2312.08344 • Published Dec 13, 2023 • 13

ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields

Paper • 2312.08136 • Published Dec 13, 2023 • 7

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27

VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 23

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 20

Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 15

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Paper • 2312.06971 • Published Dec 12, 2023 • 15

Steering Llama 2 via Contrastive Activation Addition

Paper • 2312.06681 • Published Dec 9, 2023 • 15

Honeybee: Locality-enhanced Projector for Multimodal LLM

Paper • 2312.06742 • Published Dec 11, 2023 • 14

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

Paper • 2312.07231 • Published Dec 12, 2023 • 11

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Paper • 2312.07509 • Published Dec 12, 2023 • 12

"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

Paper • 2312.06908 • Published Dec 12, 2023 • 10

LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Paper • 2312.06109 • Published Dec 11, 2023 • 21

Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 17

From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"

Paper • 2312.06571 • Published Dec 11, 2023 • 13

Efficient Quantization Strategies for Latent Diffusion Models

Paper • 2312.05431 • Published Dec 9, 2023 • 12

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Paper • 2312.06353 • Published Dec 11, 2023 • 7

Evaluation of Large Language Models for Decision Making in Autonomous Driving

Paper • 2312.06351 • Published Dec 11, 2023 • 6

Using Captum to Explain Generative Language Models

Paper • 2312.05491 • Published Dec 9, 2023 • 4

TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

Paper • 2312.05605 • Published Dec 9, 2023 • 3

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 38

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Paper • 2312.04655 • Published Dec 7, 2023 • 21

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 17

Customizing Motion in Text-to-Video Diffusion Models

Paper • 2312.04966 • Published Dec 7, 2023 • 11

PathFinder: Guided Search over Multi-Step Reasoning Paths

Paper • 2312.05180 • Published Dec 8, 2023 • 10

MVDD: Multi-View Depth Diffusion Models

Paper • 2312.04875 • Published Dec 8, 2023 • 10

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Paper • 2312.04916 • Published Dec 8, 2023 • 7

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Paper • 2312.04837 • Published Dec 8, 2023 • 3

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 32

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 31

Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 23

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 15

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 13

NeRFiller: Completing Scenes via Generative 3D Inpainting

Paper • 2312.04560 • Published Dec 7, 2023 • 12

Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 12

Gen2Det: Generate to Detect

Paper • 2312.04566 • Published Dec 7, 2023 • 10

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Paper • 2312.04483 • Published Dec 7, 2023 • 7

Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 7

Generating Illustrated Instructions

Paper • 2312.04552 • Published Dec 7, 2023 • 8

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Paper • 2312.03849 • Published Dec 6, 2023 • 6

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34

Relightable Gaussian Codec Avatars

Paper • 2312.03704 • Published Dec 6, 2023 • 31

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Paper • 2312.03029 • Published Dec 5, 2023 • 25

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 21

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Paper • 2312.03209 • Published Dec 6, 2023 • 19

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Paper • 2312.03461 • Published Dec 6, 2023 • 16

Context Diffusion: In-Context Aware Image Generation

Paper • 2312.03584 • Published Dec 6, 2023 • 15

LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Paper • 2312.03079 • Published Dec 5, 2023 • 14

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Paper • 2312.03611 • Published Dec 6, 2023 • 8

MagicStick: Controllable Video Editing via Control Handle Transformations

Paper • 2312.03047 • Published Dec 5, 2023 • 10

Self-conditioned Image Generation via Generating Representations

Paper • 2312.03701 • Published Dec 6, 2023 • 8

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

Paper • 2312.03664 • Published Dec 6, 2023 • 10

Language-Informed Visual Concept Learning

Paper • 2312.03587 • Published Dec 6, 2023 • 7

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 27

LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 18

Describing Differences in Image Sets with Natural Language

Paper • 2312.02974 • Published Dec 5, 2023 • 16

Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 14

DragVideo: Interactive Drag-style Video Editing

Paper • 2312.02216 • Published Dec 3, 2023 • 12

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Paper • 2312.02963 • Published Dec 5, 2023 • 11

Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 13

ReconFusion: 3D Reconstruction with Diffusion Priors

Paper • 2312.02981 • Published Dec 5, 2023 • 10

Training Chain-of-Thought via Latent-Variable Inference

Paper • 2312.02179 • Published Nov 28, 2023 • 11

Alchemist: Parametric Control of Material Properties with Diffusion Models

Paper • 2312.02970 • Published Dec 5, 2023 • 9

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 14

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Paper • 2312.02980 • Published Dec 5, 2023 • 10

Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions

Paper • 2312.02772 • Published Dec 5, 2023 • 8

Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 39

DeepCache: Accelerating Diffusion Models for Free

Paper • 2312.00858 • Published Dec 1, 2023 • 24

Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 17

DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 16

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Paper • 2312.02155 • Published Dec 4, 2023 • 15

Object Recognition as Next Token Prediction

Paper • 2312.02142 • Published Dec 4, 2023 • 14

GIVT: Generative Infinite-Vocabulary Transformers

Paper • 2312.02116 • Published Dec 4, 2023 • 13

Segment Any 3D Gaussians

Paper • 2312.00860 • Published Dec 1, 2023 • 11

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 12

Style Aligned Image Generation via Shared Attention

Paper • 2312.02133 • Published Dec 4, 2023 • 11

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Paper • 2312.01409 • Published Dec 3, 2023 • 11

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Paper • 2312.01407 • Published Dec 3, 2023 • 9

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

Paper • 2312.01663 • Published Dec 4, 2023 • 6

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 27

VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 24

SeaLLMs -- Large Language Models for Southeast Asia

Paper • 2312.00738 • Published Dec 1, 2023 • 26

MoMask: Generative Masked Modeling of 3D Human Motions

Paper • 2312.00063 • Published Nov 29, 2023 • 18

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Paper • 2312.00093 • Published Nov 30, 2023 • 17

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Paper • 2312.00079 • Published Nov 30, 2023 • 17

Dolphins: Multimodal Language Model for Driving

Paper • 2312.00438 • Published Dec 1, 2023 • 15

Instruction-tuning Aligns LLMs to the Human Brain

Paper • 2312.00575 • Published Dec 1, 2023 • 14

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Paper • 2312.00330 • Published Dec 1, 2023 • 13

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Paper • 2312.00109 • Published Nov 30, 2023 • 12

PyNeRF: Pyramidal Neural Radiance Fields

Paper • 2312.00252 • Published Nov 30, 2023 • 11

Towards Accurate Differential Diagnosis with Large Language Models

Paper • 2312.00164 • Published Nov 30, 2023 • 11

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Paper • 2312.00451 • Published Dec 1, 2023 • 12

Text-Guided 3D Face Synthesis -- From Generation to Editing

Paper • 2312.00375 • Published Dec 1, 2023 • 11

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

Paper • 2312.00085 • Published Nov 30, 2023 • 9

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 58

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 52

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 45

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 50

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 29

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Paper • 2311.13435 • Published Nov 22, 2023 • 19

Visual In-Context Prompting

Paper • 2311.13601 • Published Nov 22, 2023 • 19

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

Paper • 2311.13141 • Published Nov 22, 2023 • 16

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 27

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118

Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 69

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 73

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 42

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 36

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 28

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 17

Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 47

GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Paper • 2311.08469 • Published Nov 14, 2023 • 11

PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

Paper • 2311.09180 • Published Nov 15, 2023 • 8

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

Paper • 2311.08263 • Published Nov 14, 2023 • 16