smolagents documentation

Agents

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Agents

Smolagents is an experimental API which is subject to change at any time. Results returned by the agents can vary as the APIs or underlying models are prone to change.

To learn more about agents and tools make sure to read the introductory guide. This page contains the API docs for the underlying classes.

Agents

Our agents inherit from MultiStepAgent, which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in this conceptual guide.

We provide two types of agents, based on the main Agent class.

Both require arguments model and list of tools tools at initialization.

Classes of agents

class smolagents.MultiStepAgent

< >

( tools: typing.List[smolagents.tools.Tool] model: typing.Callable[[typing.List[typing.Dict[str, str]]], smolagents.models.ChatMessage] system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None max_steps: int = 6 tool_parser: typing.Optional[typing.Callable] = None add_base_tools: bool = False verbosity_level: int = 1 grammar: typing.Optional[typing.Dict[str, str]] = None managed_agents: typing.Optional[typing.List] = None step_callbacks: typing.Optional[typing.List[typing.Callable]] = None planning_interval: typing.Optional[int] = None )

Parameters

  • tools (list[Tool]) — Tools that the agent can use.
  • model (Callable[[list[dict[str, str]]], ChatMessage]) — Model that will generate the agent’s actions.
  • system_prompt (str, optional) — System prompt that will be used to generate the agent’s actions.
  • tool_description_template (str, optional) — Template used to describe the tools in the system prompt.
  • max_steps (int, default 6) — Maximum number of steps the agent can take to solve the task.
  • tool_parser (Callable, optional) — Function used to parse the tool calls from the LLM output.
  • add_base_tools (bool, default False) — Whether to add the base tools to the agent’s tools.
  • verbosity_level (int, default 1) — Level of verbosity of the agent’s logs.
  • grammar (dict[str, str], optional) — Grammar used to parse the LLM output.
  • managed_agents (list, optional) — Managed agents that the agent can call.
  • step_callbacks (list[Callable], optional) — Callbacks that will be called at each step.
  • planning_interval (int, optional) — Interval at which the agent will run a planning step.

Agent class that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of action (given by the LLM) and observation (obtained from the environment).

execute_tool_call

< >

( tool_name: str arguments: typing.Union[typing.Dict[str, str], str] )

Parameters

  • tool_name (str) — Name of the Tool to execute (should be one from self.tools).
  • arguments (Dict[str, str]) — Arguments passed to the Tool.

Execute tool with the provided input and returns the result. This method replaces arguments with the actual values from the state if they refer to state variables.

extract_action

< >

( llm_output: str split_token: str )

Parameters

  • llm_output (str) — Output of the LLM
  • split_token (str) — Separator for the action. Should match the example in the system prompt.

Parse action from the LLM output

planning_step

< >

( task is_first_step: bool step: int )

Parameters

  • task (str) — Task to perform.
  • is_first_step (bool) — If this step is not the first one, the plan should be an update over a previous plan.
  • step (int) — The number of the current step, used as an indication for the LLM.

Used periodically by the agent to plan the next steps to reach the objective.

provide_final_answer

< >

( task: str images: typing.Optional[list[str]] ) str

Parameters

  • task (str) — Task to perform.
  • images (list[str], optional) — Paths to image(s).

Returns

str

Final answer to the task.

Provide the final answer to the task, based on the logs of the agent’s interactions.

run

< >

( task: str stream: bool = False reset: bool = True single_step: bool = False images: typing.Optional[typing.List[str]] = None additional_args: typing.Optional[typing.Dict] = None )

Parameters

  • task (str) — Task to perform.
  • stream (bool) — Whether to run in a streaming way.
  • reset (bool) — Whether to reset the conversation or keep it going from previous run.
  • single_step (bool) — Whether to run the agent in one-shot fashion.
  • images (list[str], optional) — Paths to image(s).
  • additional_args (dict) — Any other variables that you want to pass to the agent run, for instance images or dataframes. Give them clear names!

Run the agent for the given task.

Example:

from smolagents import CodeAgent
agent = CodeAgent(tools=[])
agent.run("What is the result of 2 power 3.7384?")

step

< >

( log_entry: ActionStep )

To be implemented in children classes. Should return either None if the step is not final.

write_inner_memory_from_logs

< >

( summary_mode: bool = False )

Parameters

  • summary_mode (bool) — Whether to write a summary of the logs or the full logs.

Reads past llm_outputs, actions, and observations or errors from the logs into a series of messages that can be used as input to the LLM.

class smolagents.CodeAgent

< >

( tools: typing.List[smolagents.tools.Tool] model: typing.Callable[[typing.List[typing.Dict[str, str]]], smolagents.models.ChatMessage] system_prompt: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None additional_authorized_imports: typing.Optional[typing.List[str]] = None planning_interval: typing.Optional[int] = None use_e2b_executor: bool = False max_print_outputs_length: typing.Optional[int] = None **kwargs )

Parameters

  • tools (list[Tool]) — Tools that the agent can use.
  • model (Callable[[list[dict[str, str]]], ChatMessage]) — Model that will generate the agent’s actions.
  • system_prompt (str, optional) — System prompt that will be used to generate the agent’s actions.
  • grammar (dict[str, str], optional) — Grammar used to parse the LLM output.
  • additional_authorized_imports (list[str], optional) — Additional authorized imports for the agent.
  • planning_interval (int, optional) — Interval at which the agent will run a planning step.
  • use_e2b_executor (bool, default False) — Whether to use the E2B executor for remote code execution.
  • max_print_outputs_length (int, optional) — Maximum length of the print outputs.
  • **kwargs — Additional keyword arguments.

In this agent, the tool calls will be formulated by the LLM in code format, then parsed and executed.

step

< >

( log_entry: ActionStep )

Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. Returns None if the step is not final.

class smolagents.ToolCallingAgent

< >

( tools: typing.List[smolagents.tools.Tool] model: typing.Callable[[typing.List[typing.Dict[str, str]]], smolagents.models.ChatMessage] system_prompt: typing.Optional[str] = None planning_interval: typing.Optional[int] = None **kwargs )

Parameters

  • tools (list[Tool]) — Tools that the agent can use.
  • model (Callable[[list[dict[str, str]]], ChatMessage]) — Model that will generate the agent’s actions.
  • system_prompt (str, optional) — System prompt that will be used to generate the agent’s actions.
  • planning_interval (int, optional) — Interval at which the agent will run a planning step.
  • **kwargs — Additional keyword arguments.

This agent uses JSON-like tool calls, using method model.get_tool_call to leverage the LLM engine’s tool calling capabilities.

step

< >

( log_entry: ActionStep )

Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. Returns None if the step is not final.

ManagedAgent

class smolagents.ManagedAgent

< >

( agent name description additional_prompting: typing.Optional[str] = None provide_run_summary: bool = False managed_agent_prompt: typing.Optional[str] = None )

Parameters

  • agent (object) — The agent to be managed.
  • name (str) — The name of the managed agent.
  • description (str) — A description of the managed agent.
  • additional_prompting (Optional[str], optional) — Additional prompting for the managed agent. Defaults to None.
  • provide_run_summary (bool, optional) — Whether to provide a run summary after the agent completes its task. Defaults to False.
  • managed_agent_prompt (Optional[str], optional) — Custom prompt for the managed agent. Defaults to None.

ManagedAgent class that manages an agent and provides additional prompting and run summaries.

write_full_task

< >

( task )

Adds additional prompting for the managed agent, like ‘add more detail in your answer’.

stream_to_gradio

smolagents.stream_to_gradio

< >

( agent task: str reset_agent_memory: bool = False additional_args: typing.Optional[dict] = None )

Runs an agent with the given task and streams the messages from the agent as gradio ChatMessages.

GradioUI

You must have gradio installed to use the UI. Please run pip install smolagents[gradio] if it’s not the case.

class smolagents.GradioUI

< >

( agent: MultiStepAgent file_upload_folder: str | None = None )

A one-line interface to launch your agent in Gradio

upload_file

< >

( file file_uploads_log allowed_file_types = ['application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'text/plain'] )

Handle file uploads, default allowed types are .pdf, .docx, and .txt

Models

You’re free to create and use your own models to power your agent.

You could use any model callable for your agent, as long as:

  1. It follows the messages format (List[Dict[str, str]]) for its input messages, and it returns a str.
  2. It stops generating outputs before the sequences passed in the argument stop_sequences

For defining your LLM, you can make a custom_model method which accepts a list of messages and returns an object with a .content attribute containing the text. This callable also needs to accept a stop_sequences argument that indicates when to stop generating.

from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

model_id = "meta-llama/Llama-3.3-70B-Instruct"

client = InferenceClient(model=model_id)

def custom_model(messages, stop_sequences=["Task"]):
    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
    answer = response.choices[0].message
    return answer

Additionally, custom_model can also take a grammar argument. In the case where you specify a grammar upon agent initialization, this argument will be passed to the calls to model, with the grammar that you defined upon initialization, to allow constrained generation in order to force properly-formatted agent outputs.

TransformersModel

For convenience, we have added a TransformersModel that implements the points above by building a local transformers pipeline for the model_id given at initialization.

from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
>>> What a

You must have transformers and torch installed on your machine. Please run pip install smolagents[transformers] if it’s not the case.

class smolagents.TransformersModel

< >

( model_id: typing.Optional[str] = None device_map: typing.Optional[str] = None torch_dtype: typing.Optional[str] = None trust_remote_code: bool = False flatten_messages_as_text: bool = True **kwargs )

Parameters

  • model_id (str, optional, defaults to "Qwen/Qwen2.5-Coder-32B-Instruct") — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
  • device_map (str, optional) — The device_map to initialize your model with.
  • torch_dtype (str, optional) — The torch_dtype to initialize your model with.
  • trust_remote_code (bool, default False) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True.
  • flatten_messages_as_text (bool, default True) — Whether to flatten messages as text: this must be sent to False to use VLMs (as opposed to LLMs for which this flag can be ignored). Caution: this parameter is experimental and will be removed in an upcoming PR as we auto-detect VLMs.
  • kwargs (dict, optional) — Any additional keyword arguments that you want to use in model.generate(), for instance max_new_tokens or device.
  • **kwargs — Additional keyword arguments to pass to model.generate(), for instance max_new_tokens or device.

Raises

ValueError

  • ValueError — If the model name is not provided.

A class to interact with Hugging Face’s Inference API for language model interaction.

This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.

You must have transformers and torch installed on your machine. Please run pip install smolagents[transformers] if it’s not the case.

Example:

>>> engine = TransformersModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     device="cuda",
...     max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

HfApiModel

The HfApiModel wraps an HF Inference API client for the execution of the LLM.

from smolagents import HfApiModel

messages = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "No need to help, take it easy."},
]

model = HfApiModel()
print(model(messages))
>>> Of course! If you change your mind, feel free to reach out. Take care!

class smolagents.HfApiModel

< >

( model_id: str = 'Qwen/Qwen2.5-Coder-32B-Instruct' token: typing.Optional[str] = None timeout: typing.Optional[int] = 120 **kwargs )

Parameters

  • model_id (str, optional, defaults to "Qwen/Qwen2.5-Coder-32B-Instruct") — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
  • token (str, optional) — Token used by the Hugging Face API for authentication. This token need to be authorized ‘Make calls to the serverless Inference API’. If the model is gated (like Llama-3 models), the token also needs ‘Read access to contents of all public gated repos you can access’. If not provided, the class will try to use environment variable ‘HF_TOKEN’, else use the token stored in the Hugging Face CLI configuration.
  • timeout (int, optional, defaults to 120) — Timeout for the API request, in seconds.
  • **kwargs — Additional keyword arguments to pass to the Hugging Face API.

Raises

ValueError

  • ValueError — If the model name is not provided.

A class to interact with Hugging Face’s Inference API for language model interaction.

This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.

Example:

>>> engine = HfApiModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     token="your_hf_token_here",
...     max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

LiteLLMModel

The LiteLLMModel leverages LiteLLM to support 100+ LLMs from various providers. You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass temperature.

from smolagents import LiteLLMModel

messages = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "No need to help, take it easy."},
]

model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))

class smolagents.LiteLLMModel

< >

( model_id = 'anthropic/claude-3-5-sonnet-20240620' api_base = None api_key = None **kwargs )

Parameters

  • model_id (str) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”).
  • api_base (str, optional) — The base URL of the OpenAI-compatible API server.
  • api_key (str, optional) — The API key to use for authentication.
  • **kwargs — Additional keyword arguments to pass to the OpenAI API.

This model connects to LiteLLM as a gateway to hundreds of LLMs.

OpenAIServerModel

This class lets you call any OpenAIServer compatible model. Here’s how you can set it (you can customise the api_base url to point to another server):

from smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)

class smolagents.OpenAIServerModel

< >

( model_id: str api_base: typing.Optional[str] = None api_key: typing.Optional[str] = None custom_role_conversions: typing.Optional[typing.Dict[str, str]] = None **kwargs )

Parameters

  • model_id (str) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”).
  • api_base (str, optional) — The base URL of the OpenAI-compatible API server.
  • api_key (str, optional) — The API key to use for authentication.
  • custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
  • **kwargs — Additional keyword arguments to pass to the OpenAI API.

This model connects to an OpenAI-compatible API server.

AzureOpenAIServerModel

AzureOpenAIServerModel allows you to connect to any Azure OpenAI deployment.

Below you can find an example of how to set it up, note that you can omit the azure_endpoint, api_key, and api_version arguments, provided you’ve set the corresponding environment variables — AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, and OPENAI_API_VERSION.

Pay attention to the lack of an AZURE_ prefix for OPENAI_API_VERSION, this is due to the way the underlying openai package is designed.

import os

from smolagents import AzureOpenAIServerModel

model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)

class smolagents.AzureOpenAIServerModel

< >

( model_id: str azure_endpoint: typing.Optional[str] = None api_key: typing.Optional[str] = None api_version: typing.Optional[str] = None custom_role_conversions: typing.Optional[typing.Dict[str, str]] = None **kwargs )

Parameters

  • model_id (str) — The model deployment name to use when connecting (e.g. “gpt-4o-mini”).
  • azure_endpoint (str, optional) — The Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. If not provided, it will be inferred from the AZURE_OPENAI_ENDPOINT environment variable.
  • api_key (str, optional) — The API key to use for authentication. If not provided, it will be inferred from the AZURE_OPENAI_API_KEY environment variable.
  • api_version (str, optional) — The API version to use. If not provided, it will be inferred from the OPENAI_API_VERSION environment variable.
  • custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
  • **kwargs — Additional keyword arguments to pass to the Azure OpenAI API.

This model connects to an Azure OpenAI deployment.

< > Update on GitHub