Post
184
smolagents can see π₯
we just shipped vision support to smolagents π€ agentic computers FTW
you can now:
π» let the agent get images dynamically (e.g. agentic web browser)
π pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! π€―
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) π€
read our blog http://hf.co/blog/smolagents-can-see
we just shipped vision support to smolagents π€ agentic computers FTW
you can now:
π» let the agent get images dynamically (e.g. agentic web browser)
π pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! π€―
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) π€
read our blog http://hf.co/blog/smolagents-can-see