Submitted by akhaliq 19 Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding · 3 authors 7
Submitted by akhaliq 11 A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models · 7 authors 1
Submitted by akhaliq 4 Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias · 12 authors 4
Submitted by akhaliq 4 MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion · 6 authors
Submitted by akhaliq 3 The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation · 7 authors
Submitted by akhaliq 2 Large Language Models of Code Fail at Completing Code with Potential Bugs · 7 authors
Submitted by akhaliq 2 GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System · 5 authors
Submitted by akhaliq 1 PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model · 6 authors
Submitted by akhaliq 1 VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores · 5 authors
Submitted by akhaliq 1 Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? · 8 authors 1