Spaces:
Running
on
CPU Upgrade
GAIA2 only applicable to LLM evaluation not agent scaffold evaluation?
Don't see mechanism to support alternative agents, only LLMs.
For running the benchmark and submitting results here, to be comparable, we recommend using the base react agent implementation provided with ARE. For your own experimentation, you can control the agent, however it's a bit harder than just changing the llm source as there is no standard there yet. You can check the BaseAgent class and pointers on how to extend it here: https://facebookresearch.github.io/meta-agents-research-environments/api_reference/agents.html or go deep and implement this class: https://github.com/facebookresearch/meta-agents-research-environments/blob/main/are/simulation/agents/are_simulation_agent.py and see how the agent builder chooses the agent: https://github.com/facebookresearch/meta-agents-research-environments/blob/main/are/simulation/agents/agent_builder.py#L22
You might want to move this discussion to the github repo where they might get more visibility.