CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published Dec 20, 2024 • 22
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper • 2402.03162 • Published Feb 5, 2024 • 19
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Paper • 2401.15977 • Published Jan 29, 2024 • 38
Open Image Preferences Collection Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated Dec 19, 2024 • 7
view article Article Crowd-sourced Open Preference Dataset for Text-to-Image Generation By RapidataAI • 27 days ago • 18
Lucie LLM Collection Open weights LLM for French, English, German, Spanish and Italian • 8 items • Updated about 14 hours ago • 17
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 35
Graph-Aware Isomorphic Attention in Transformers Collection We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism. • 4 items • Updated 25 days ago • 2
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models Paper • 2407.15886 • Published Jul 21, 2024 • 3
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published Aug 19, 2024 • 11
view article Article Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚 By Isayoften • Jul 10, 2024 • 44
OneDiffusion Collection Collection of different version of OneDiffusion models • 9 items • Updated 6 days ago • 2
Bamba Collection Collection of Bamba - hybrid Mamba2 model architecture based models trained on open data • 8 items • Updated Dec 18, 2024 • 18
WavTokenizer-Medium-Large Collection https://arxiv.org/abs/2408.16532 • 5 items • Updated Oct 23, 2024 • 7
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 200
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 10 items • Updated 5 days ago • 84
Cephalo Collection Cephalo is a series of multimodal vision large language models (V-LLMs) designed to integrate visual and linguistic reasoning in materials science. • 15 items • Updated 13 days ago • 4