Run GAIA agent to answer and submit evaluation questions
SLR-BENCH Leaderboard shows the performance of LLMs