Enterprise orgs now enable serverless Inference Providers for all members - includes $2 free usage per org member (e.g. an Enterprise org with 1,000 members share $2,000 free credit each month) - admins can set a monthly spend limit for the entire org - works today with Together, fal, Novita, Cerebras and HF Inference.
✨MLLM > R1 Omni by Alibaba Tongyi - 0.5B > Qwen2.5 Omni by Alibaba Qwen - 7B with apache2.0
🖼️Video > CogView-4 by ZhipuAI - Apacha2.0 > HunyuanVideo-I2V by TencentHunyuan > Open Sora2.0 - 11B with Apache2.0 > Stepvideo TI2V by StepFun AI - 30B with MIT license
⚡️Image/3D > Hunyuan3D 2mv/2mini (0.6B) by @TencentHunyuan > FlexWorld by ByteDance - MIT license > Qwen2.5-VL-32B-Instruct by Alibaba Qwen - Apache2.0 > Tripo SG (1.5B)/SF by VastAIResearch - MIT license > InfiniteYou by ByteDance
> LHM by Alibaba AIGC team - Apache2.0 > Spatial LM by ManyCore
🧠Reasoning > QwQ-32B by Alibaba Qwen - Apache2.0 > Skywork R1V - 38B with MIT license > RWKV G1 by RWKV AI - 0.1B pure RNN reasoning model with Apache2.0 > Fin R1 by SUFE AIFLM Lab - financial reasoning
🔠LLM > DeepSeek v3 0324 by DeepSeek -MIT license > Babel by Alibaba DAMO - 9B/83B/25 languages
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s. For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
8 replies
·
reacted to fdaudens's
post with ❤️about 1 month ago
🚀 Just launched: A toolkit of 20 powerful AI tools that journalists can use right now - transcribe, analyze, create. 100% free & open-source.
Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally: - Private, on-device transcription with speaker ID in 100+ languages using Whisper - Website scraping that just works - paste a URL, get structured data - Local image editing with tools like Finegrain (impressive results) - Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
🚀 Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! 🤗
🔍 Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1⃣ One-click deployment: on Spaces with persistent storage and integrated OAuth 🛠 Simple Prompt Management: Version, edit, and update without redeployment ✅ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality 📊 Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product they’re building! 👏 @marcklingen@Clemo@MJannik
Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development. The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6
🚀 Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:
- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well - 📉 Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection - 🧠 Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k). - 💡 What’s next? I’m preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI - A big company will see its market cap divided by two or more because of AI - At least 100,000 personal AI robots will be pre-ordered - China will start to lead the AI race (as a consequence of leading the open-source AI race). - There will be big breakthroughs in AI for biology and chemistry. - We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price ✅ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs ✅ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry ✅ for video 🔴for time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI ✅Monetary 🔴Environmental (😢)
- A popular media will be mostly AI-generated ✅ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment 🔜currently 7M of AI builders on Hugging Face
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯 - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀 - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!