Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.05441

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 16 days ago • 84

Generative Models

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 16 days ago • 84

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 16 days ago • 84

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 16 days ago • 84

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 24 days ago • 98
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published 24 days ago • 48
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published 23 days ago • 36
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Paper • 2411.13552 • Published Nov 20, 2024

General Training

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 41
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published 14 days ago • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 16 days ago • 84

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 89
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 6.33k • 585
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 89

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 18
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published Dec 6, 2024 • 45
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published Dec 6, 2024 • 47
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Paper • 2412.05939 • Published Dec 8, 2024 • 15

Computer vision

A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond

Paper • 2410.02362 • Published Oct 3, 2024 • 18
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

Paper • 2401.12208 • Published Jan 22, 2024 • 22
Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization

Paper • 2007.14895 • Published Jul 29, 2020 • 1
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published Dec 24, 2024 • 70

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 96
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 35
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 112

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs