GAIA2 only applicable to LLM evaluation not agent scaffold evaluation?

#1
by pseudotensor - opened

Don't see mechanism to support alternative agents, only LLMs.

Meta Agents Research Environments org

For running the benchmark and submitting results here, to be comparable, we recommend using the base react agent implementation provided with ARE. For your own experimentation, you can control the agent, however it's a bit harder than just changing the llm source as there is no standard there yet. You can check the BaseAgent class and pointers on how to extend it here: https://facebookresearch.github.io/meta-agents-research-environments/api_reference/agents.html or go deep and implement this class: https://github.com/facebookresearch/meta-agents-research-environments/blob/main/are/simulation/agents/are_simulation_agent.py and see how the agent builder chooses the agent: https://github.com/facebookresearch/meta-agents-research-environments/blob/main/are/simulation/agents/agent_builder.py#L22

You might want to move this discussion to the github repo where they might get more visibility.

Sign up or log in to comment