JARVIS-VLA-v1 Collection Vision-Language-Action Models in Minecraft. • 4 items • Updated 3 days ago • 9
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 5 days ago • 40
view post Post 2168 Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️👉GitHub: https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-EdgeDemo supporting both text-to-speech and text-to-llm responses in speech. > voice: tara, dan, emma, josh> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.🥠Orpheus-3b-0.1-ft Model Page: canopylabs/orpheus-3b-0.1-ft🥠Orpheus-3b-0.1-ftColab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing🥠Finetune [ orpheus-3b-0.1-pretrained ]Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune🥠Model-releases:https://canopylabs.ai/model-releases See translation 1 reply · 🔥 7 7 + Reply
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published 12 days ago • 19
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14, 2024 • 12
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published 10 days ago • 9
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published 8 days ago • 13
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published 11 days ago • 118