Alvaro Bartolome
AI & ML interests
machine learning @huggingface (inference + cloud)
Recent Activity
updated
a dataset
1 day ago
huggingface/DEH-image-scan-data
liked
a model
5 days ago
nvidia/personaplex-7b-v1
replied to
their
post
6 days ago
π₯ `hf-mem` v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the `--experimental` flag!
`uvx hf-mem --model-id ... --experimental` will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.
π‘ Alternatively, you can also set the `--max-model-len`, `--batch-size` and `--kv-cache-dtype` arguments (Γ la vLLM) manually if preferred.
Organizations
replied to
their
post
6 days ago
posted
an
update
6 days ago
Post
2796
π₯
π‘ Alternatively, you can also set the
hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.π‘ Alternatively, you can also set the
--max-model-len, --batch-size and --kv-cache-dtype arguments (Γ la vLLM) manually if preferred.
reacted to
ronantakizawa's
post with β€οΈ
about 1 month ago
Post
5619
Thank you
@clem
(Co-Founder & CEO of Hugging Face) for sharing my dataset on X / Twitter!
ronantakizawa/github-top-developers
#github #dataset
ronantakizawa/github-top-developers
#github #dataset
reacted to
danieldk's
post with π₯π€
8 months ago
Post
1955
We have been working on a project called
We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
- Experimental support for loading Apple Silicon Metal π€ Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
kernels. kernels makes it possible to load compute kernels directly from the Hub! πWe plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
torch.compile support.- Experimental support for loading Apple Silicon Metal π€ Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
posted
an
update
11 months ago
Post
3634
π₯ Agents can do anything!
@microsoft
Research just announced the release of Magma 8B!
Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!
Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases
Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)
Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!
Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases
Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)
reacted to
merve's
post with ππ€
about 1 year ago
Post
5604
Oof, what a week! π₯΅ So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Loving these, please never stop, these are so useful π
reacted to
merve's
post with π₯
about 1 year ago
Post
5604
Oof, what a week! π₯΅ So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
replied to
burtenshaw's
post
about 1 year ago
oh nice to see you here @bechavarren ππ»
reacted to
mlabonne's
post with π€π₯
about 1 year ago
Post
7177
π LLM Course 2025 edition!
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
π» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Thanks everyone, hope you'll enjoy it!
π» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
reacted to
AdinaY's
post with ππ₯
about 1 year ago
Post
1674
π The wave of reasoning models from the Chinese community has arrived!
π Marco-o1 by AIDC, Alibaba
π AIDC-AI/Marco-o1
β¨ QwQ by Qwen, Alibaba
π Qwen/qwq-674762b79b75eac01735070a
π Skywork-o1 by Kunlun Tech
π Skywork/skywork-o1-open-67453df58e12f6c3934738d0
π₯ Xkev/Llama-3.2V-11B-cot by PKU Yuan group
π Xkev/Llama-3.2V-11B-cot
π‘ DeepSeek-R1-Lite-Preview by DeepSeek AI
π https://chat.deepseek.com/
π InternThinker Preview by Shanghai AI Lab
π https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp
π k0-math by Moonshot AI
π https://kimi.moonshot.cn/ ( coming soon! )
Who's next? π
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7
π Marco-o1 by AIDC, Alibaba
π AIDC-AI/Marco-o1
β¨ QwQ by Qwen, Alibaba
π Qwen/qwq-674762b79b75eac01735070a
π Skywork-o1 by Kunlun Tech
π Skywork/skywork-o1-open-67453df58e12f6c3934738d0
π₯ Xkev/Llama-3.2V-11B-cot by PKU Yuan group
π Xkev/Llama-3.2V-11B-cot
π‘ DeepSeek-R1-Lite-Preview by DeepSeek AI
π https://chat.deepseek.com/
π InternThinker Preview by Shanghai AI Lab
π https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp
π k0-math by Moonshot AI
π https://kimi.moonshot.cn/ ( coming soon! )
Who's next? π
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7
reacted to
andito's
post with π₯π
about 1 year ago
Post
2007
SmolVLM speeding locally on a laptop thanks to mlx-vlm and
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit
Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
@Gradio ! Try it with two lines:
pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix
python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit
Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to
danielhanchen's
post with π₯
about 1 year ago
Post
1561
Vision finetuning is in π¦₯Unsloth! You can now finetune Llama 3.2, Qwen2 VL, Pixtral and all Llava variants up to 2x faster and with up to 70% less VRAM usage! Colab to finetune Llama 3.2: https://colab.research.google.com/drive/1j0N4XTY1zXXy7mPAhOC1_gMYZ2F2EBlk?usp=sharing
reacted to
clem's
post with π₯
over 1 year ago
Post
4533
This is no Woodstock AI but will be fun nonetheless haha. Iβll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
1,000 spots available first-come first serve with some surprises during the stream!
You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
posted
an
update
over 1 year ago
Post
3038
π€ Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)
In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI
Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And weβre not going to stop here β stay tuned as we enable more experiences to build AI with open models on Google Cloud!
Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai
In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI
Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And weβre not going to stop here β stay tuned as we enable more experiences to build AI with open models on Google Cloud!
Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai