Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merveΒ 
posted an update about 7 hours ago
Post
184
smolagents can see πŸ”₯
we just shipped vision support to smolagents πŸ€— agentic computers FTW

you can now:
πŸ’» let the agent get images dynamically (e.g. agentic web browser)
πŸ“‘ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🀯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🀠

read our blog http://hf.co/blog/smolagents-can-see
In this post