Post
1880
๐๐๐๐, ๐ญ๐ก๐ ๐ฒ๐๐๐ซ ๐จ๐ ๐๐ ๐๐ง๐ญ ๐ฐ๐จ๐ซ๐ค๐๐ฅ๐จ๐ฐ๐ฌ ๐ง๐ฆพ๐ค
I've just watched Andrew Ng's talk at Sequoia last week.
If you're interested in Agents, you should really watch it!
๐ช๐ต๐ ๐๐๐ฒ ๐ฎ๐ด๐ฒ๐ป๐ ๐๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐๐?
The current LLM task solving workflow is not very intuitive:
We ask it โwrite an essay all in one shot, without ever using backspace.โ
Why not allow the LLM a more similar process to what we would do?
- โWrite an essay outlineโ
- โDo you need wen research?โ
- โWrite a first draftโ
- โConsider improvementsโ
โฆ
This is called an Agentic workflow. Existing ones bring a huge performance boost. With HumanEval: GPT-4 zero-shot gets 67% score, agentic with either one of tool use or reflection goes over 90%, and the combination of the two scores even higher!
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ถ๐ป๐ด ๐ฑ๐ฒ๐๐ถ๐ด๐ป ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป๐
On the following two points, the tech is robust:
โ๏ธ ๐ฅ๐ฒ๐ณ๐น๐ฒ๐ ๐ถ๐ผ๐ป: For instance: add a critic step after the writing step
๐ ๏ธ ๐ง๐ผ๐ผ๐น ๐๐๐ฒ: extends the capabilities of the LLM by allowing it to call tools, like search or calculator
The next two will be needed to go further, but the tech for them is more emerging and not reliable yet:
๐บ๏ธ ๐ฃ๐น๐ฎ๐ป๐ป๐ถ๐ป๐ด forward to decompose task into subtasks. This allows great behaviours like an AI Agent re-routing after a failure
๐ ๐ ๐๐น๐๐ถ-๐ฎ๐ด๐ฒ๐ป๐ ๐ฐ๐ผ๐น๐น๐ฎ๐ฏ๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป: Program a flock of agents with tasks.
Improving the two above points will unlock huge performance boosts!
Andrew NG says Research agents are already part of his workflow!
๐๐น๐ผ๐๐ถ๐ป๐ด ๐๐ต๐ผ๐๐ด๐ต๐๐
Andrew speculates that through agentic workflows, maybe generating many tokens fast from a small LLM will give better results than slower throughput from a powerful LLM like GPT-5.
๐ฌ Watch the talk here ๐ https://www.youtube.com/watch?v=sal78ACtGTc
๐ I've added his recommended reads to m-ric/agents-65ba776fbd9e29f771c07d4e
I've just watched Andrew Ng's talk at Sequoia last week.
If you're interested in Agents, you should really watch it!
๐ช๐ต๐ ๐๐๐ฒ ๐ฎ๐ด๐ฒ๐ป๐ ๐๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐๐?
The current LLM task solving workflow is not very intuitive:
We ask it โwrite an essay all in one shot, without ever using backspace.โ
Why not allow the LLM a more similar process to what we would do?
- โWrite an essay outlineโ
- โDo you need wen research?โ
- โWrite a first draftโ
- โConsider improvementsโ
โฆ
This is called an Agentic workflow. Existing ones bring a huge performance boost. With HumanEval: GPT-4 zero-shot gets 67% score, agentic with either one of tool use or reflection goes over 90%, and the combination of the two scores even higher!
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ถ๐ป๐ด ๐ฑ๐ฒ๐๐ถ๐ด๐ป ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป๐
On the following two points, the tech is robust:
โ๏ธ ๐ฅ๐ฒ๐ณ๐น๐ฒ๐ ๐ถ๐ผ๐ป: For instance: add a critic step after the writing step
๐ ๏ธ ๐ง๐ผ๐ผ๐น ๐๐๐ฒ: extends the capabilities of the LLM by allowing it to call tools, like search or calculator
The next two will be needed to go further, but the tech for them is more emerging and not reliable yet:
๐บ๏ธ ๐ฃ๐น๐ฎ๐ป๐ป๐ถ๐ป๐ด forward to decompose task into subtasks. This allows great behaviours like an AI Agent re-routing after a failure
๐ ๐ ๐๐น๐๐ถ-๐ฎ๐ด๐ฒ๐ป๐ ๐ฐ๐ผ๐น๐น๐ฎ๐ฏ๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป: Program a flock of agents with tasks.
Improving the two above points will unlock huge performance boosts!
Andrew NG says Research agents are already part of his workflow!
๐๐น๐ผ๐๐ถ๐ป๐ด ๐๐ต๐ผ๐๐ด๐ต๐๐
Andrew speculates that through agentic workflows, maybe generating many tokens fast from a small LLM will give better results than slower throughput from a powerful LLM like GPT-5.
๐ฌ Watch the talk here ๐ https://www.youtube.com/watch?v=sal78ACtGTc
๐ I've added his recommended reads to m-ric/agents-65ba776fbd9e29f771c07d4e