yjeong75
's Collections
Large Language Models as Optimizers
Paper
•
2309.03409
•
Published
•
75
Natural Language Supervision for General-Purpose Audio Representations
Paper
•
2309.05767
•
Published
•
9
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
•
2309.08532
•
Published
•
52
AudioSR: Versatile Audio Super-resolution at Scale
Paper
•
2309.07314
•
Published
•
24
Enhance audio generation controllability through representation
similarity regularization
Paper
•
2309.08773
•
Published
•
3
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
86
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
•
2310.00704
•
Published
•
19
Toward Joint Language Modeling for Speech Units and Text
Paper
•
2310.08715
•
Published
•
7
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative
Editing
Paper
•
2310.12404
•
Published
•
15
MusicAgent: An AI Agent for Music Understanding and Generation with
Large Language Models
Paper
•
2310.11954
•
Published
•
24
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper
•
2310.17796
•
Published
•
16
Large Language Models as Generalizable Policies for Embodied Tasks
Paper
•
2310.17722
•
Published
•
6
Controlled Decoding from Language Models
Paper
•
2310.17022
•
Published
•
14
In-Context Learning Creates Task Vectors
Paper
•
2310.15916
•
Published
•
41
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
•
2311.00571
•
Published
•
40
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper
•
2311.00059
•
Published
•
18
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
•
2311.00618
•
Published
•
21
In-Context Prompt Editing For Conditional Audio Generation
Paper
•
2311.00895
•
Published
•
10
FLAP: Fast Language-Audio Pre-training
Paper
•
2311.01615
•
Published
•
16
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper
•
2311.02462
•
Published
•
32
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper
•
2311.04589
•
Published
•
18
Prompt Engineering a Prompt Engineer
Paper
•
2311.05661
•
Published
•
20
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
•
2311.05556
•
Published
•
79
Towards General-Purpose Speech Abilities for Large Language Models Using
Unpaired Data
Paper
•
2311.06753
•
Published
•
6
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper
•
2311.07069
•
Published
•
43
Attention or Convolution: Transformer Encoders in Audio Language Models
for Inference Efficiency
Paper
•
2311.02772
•
Published
•
3
EDMSound: Spectrogram Based Diffusion Models for Efficient and
High-Quality Audio Synthesis
Paper
•
2311.08667
•
Published
•
18
Qwen-Audio: Advancing Universal Audio Understanding via Unified
Large-Scale Audio-Language Models
Paper
•
2311.07919
•
Published
•
9
M^{2}UGen: Multi-modal Music Understanding and Generation with the
Power of Large Language Models
Paper
•
2311.11255
•
Published
•
3
GAIA: a benchmark for General AI Assistants
Paper
•
2311.12983
•
Published
•
182
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack
Paper
•
2309.15807
•
Published
•
32
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning
Paper
•
2312.01552
•
Published
•
30
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
•
2312.09911
•
Published
•
52
Training Chain-of-Thought via Latent-Variable Inference
Paper
•
2312.02179
•
Published
•
8
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
StemGen: A music generation model that listens
Paper
•
2312.08723
•
Published
•
47
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Paper
•
2312.07661
•
Published
•
16
Interfacing Foundation Models' Embeddings
Paper
•
2312.07532
•
Published
•
10
Efficient Quantization Strategies for Latent Diffusion Models
Paper
•
2312.05431
•
Published
•
11
Prompt Expansion for Adaptive Text-to-Image Generation
Paper
•
2312.16720
•
Published
•
5
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper
•
2312.15821
•
Published
•
12
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper
•
2401.00246
•
Published
•
10
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper
•
2401.01055
•
Published
•
53
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper
•
2401.01952
•
Published
•
30
Moonshot: Towards Controllable Video Generation and Editing with
Multimodal Conditions
Paper
•
2401.01827
•
Published
•
15
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper
•
2401.04577
•
Published
•
41
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper
•
2401.06105
•
Published
•
46
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
•
2401.06080
•
Published
•
24
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
•
2401.13601
•
Published
•
44
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
Token Detection
Paper
•
2401.13160
•
Published
•
11
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
Zero-Shot Voice Conversion
Paper
•
2401.11053
•
Published
•
9
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper
•
2401.12954
•
Published
•
28
SymbolicAI: A framework for logic-based approaches combining generative
models and solvers
Paper
•
2402.00854
•
Published
•
19
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
49
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Paper
•
2402.06178
•
Published
•
13
An Interactive Agent Foundation Model
Paper
•
2402.05929
•
Published
•
26
Multilingual E5 Text Embeddings: A Technical Report
Paper
•
2402.05672
•
Published
•
20
Fast Timing-Conditioned Latent Audio Diffusion
Paper
•
2402.04825
•
Published
•
7
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper
•
2402.07383
•
Published
•
13
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
on 100K hours of data
Paper
•
2402.08093
•
Published
•
54
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•
Published
•
15
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper
•
2402.16153
•
Published
•
55
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper
•
2402.13763
•
Published
•
9
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
•
2403.03100
•
Published
•
34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
592
Teaching Large Language Models to Reason with Reinforcement Learning
Paper
•
2403.04642
•
Published
•
46
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
•
2403.05530
•
Published
•
59
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper
•
2403.10493
•
Published
•
16
Measuring Style Similarity in Diffusion Models
Paper
•
2404.01292
•
Published
•
16
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
•
2404.03204
•
Published
•
7
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper
•
2403.20327
•
Published
•
47
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
•
2404.09956
•
Published
•
11
Better & Faster Large Language Models via Multi-token Prediction
Paper
•
2404.19737
•
Published
•
73