
Text Generation Inference
AI & ML interests
Maintainers of the `huggingface/text-generation-inference` repo
text-generation-inference's activity
Post
2156
Important notice ๐จ
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference โ with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference โ with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
Post
7018
I was chatting with
@peakji
, one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use โHUGGINGFACEโ to get access!
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use โHUGGINGFACEโ to get access!
Post
4645
10,000+ models based on Deepseek R1 have been publicly shared on Hugging Face! Which ones are your favorite ones: https://huggingface.co/models?sort=trending&search=r1. Truly game-changer!
Post
5873
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
Post
2822
What are the best organizations to follow on
@huggingface
?
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch
Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
Post
3483
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers
Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.
Have you been using any integration and how can we make it better?
https://huggingface.co/blog/inference-providers
Post
4228
Hey everyone, we've given https://hf.co/spaces page a fresh update!
Smart Search: Now just type what you want to doโlike "make a viral meme" or "generate music"โand our search gets it.
New Categories: Check out the cool new filter bar with icons to help you pick a category fast.
Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.
Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.
Weโd love to hear what you thinkโdrop us some feedback plz!
Smart Search: Now just type what you want to doโlike "make a viral meme" or "generate music"โand our search gets it.
New Categories: Check out the cool new filter bar with icons to help you pick a category fast.
Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.
Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.
Weโd love to hear what you thinkโdrop us some feedback plz!
Post
3057
Finally, an open-source AI that turns your lyrics into full songs is hereโmeet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!
m-a-p/YuE-s1-7B-anneal-en-cot
m-a-p/YuE-s1-7B-anneal-en-cot
Post
7232
AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!
Post
4360
Cool to see
@ylecun
joining the top 10 of most followed on HF!
(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)
(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)
Post
2058
Coming back to Paris Friday to open our new Hugging Face office!
We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐ค๐ฆพ๐ฆฟ
https://t.co/enkFXjWndJ
We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐ค๐ฆพ๐ฆฟ
https://t.co/enkFXjWndJ
Post
1416
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !
3x more tokens.
By reducing our memory footprint, weโre able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster
On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config
Thatโs it. Remove all the flags your are using and youโre likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโt have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.
Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
3x more tokens.
By reducing our memory footprint, weโre able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster
On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config
Thatโs it. Remove all the flags your are using and youโre likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโt have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.
Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
Post
10356
After some heated discussion ๐ฅ, we clarify our intent re. storage limits on the Hub
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ
cc: @reach-vb @pierric @victor and the HF team
TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
docs: https://huggingface.co/docs/hub/storage-limits
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ
cc: @reach-vb @pierric @victor and the HF team
Post
4678
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
โ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs
โ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry
โ for video ๐ดfor time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI
โ Monetary ๐ดEnvironmental (๐ข)
- A popular media will be mostly AI-generated
โ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment
๐currently 7M of AI builders on Hugging Face
- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
โ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs
โ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry
โ for video ๐ดfor time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI
โ Monetary ๐ดEnvironmental (๐ข)
- A popular media will be mostly AI-generated
โ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment
๐currently 7M of AI builders on Hugging Face
Post
4436
Hugging Face is becoming the best place to share the most viral AI apps with spaces.
Kolors Virtual Try-on just crossed 6,000,000 unique visitors & is now the #5 most popular space. Congrats to the Kwai Kolors team!
Kwai-Kolors/Kolors-Virtual-Try-On
Kolors Virtual Try-on just crossed 6,000,000 unique visitors & is now the #5 most popular space. Congrats to the Kwai Kolors team!
Kwai-Kolors/Kolors-Virtual-Try-On
Post
3198
wow ๐ฎ
INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
PrimeIntellect/INTELLECT-1-Instruct
INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
PrimeIntellect/INTELLECT-1-Instruct
Post
2216
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...
I tested it against some really challenging reasoning prompts and the results are amazing ๐คฏ.
Check this dataset for the results: victor/qwq-misguided-attention
I tested it against some really challenging reasoning prompts and the results are amazing ๐คฏ.
Check this dataset for the results: victor/qwq-misguided-attention
Post
1998
I've been in Brazil for 10 days now ๐ง๐ท๐ง๐ท๐ง๐ท
I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps.
Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!
I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps.
Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!