@m-ric on Hugging Face: "𝟐𝟎𝟐𝟒, 𝐭𝐡𝐞 𝐲𝐞𝐚𝐫 𝐨𝐟 𝐚𝐠𝐞𝐧𝐭 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰𝐬 🔧🦾🤖 I've just…"

Post

1880

𝟐𝟎𝟐𝟒, 𝐭𝐡𝐞 𝐲𝐞𝐚𝐫 𝐨𝐟 𝐚𝐠𝐞𝐧𝐭 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰𝐬 🔧🦾🤖

I've just watched Andrew Ng's talk at Sequoia last week.
If you're interested in Agents, you should really watch it!

𝗪𝗵𝘆 𝘂𝘀𝗲 𝗮𝗴𝗲𝗻𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀?
The current LLM task solving workflow is not very intuitive:
We ask it “write an essay all in one shot, without ever using backspace.”

Why not allow the LLM a more similar process to what we would do?
- “Write an essay outline”
- “Do you need wen research?”
- “Write a first draft”
- “Consider improvements”
…

This is called an Agentic workflow. Existing ones bring a huge performance boost. With HumanEval: GPT-4 zero-shot gets 67% score, agentic with either one of tool use or reflection goes over 90%, and the combination of the two scores even higher!

𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗱𝗲𝘀𝗶𝗴𝗻 𝗽𝗮𝘁𝘁𝗲𝗿𝗻𝘀
On the following two points, the tech is robust:

⚙️ 𝗥𝗲𝗳𝗹𝗲𝘅𝗶𝗼𝗻: For instance: add a critic step after the writing step
🛠️ 𝗧𝗼𝗼𝗹 𝘂𝘀𝗲: extends the capabilities of the LLM by allowing it to call tools, like search or calculator

The next two will be needed to go further, but the tech for them is more emerging and not reliable yet:
🗺️ 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 forward to decompose task into subtasks. This allows great behaviours like an AI Agent re-routing after a failure
🐝 𝗠𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗹𝗹𝗮𝗯𝗼𝗿𝗮𝘁𝗶𝗼𝗻: Program a flock of agents with tasks.
Improving the two above points will unlock huge performance boosts!

Andrew NG says Research agents are already part of his workflow!

𝗖𝗹𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝘀
Andrew speculates that through agentic workflows, maybe generating many tokens fast from a small LLM will give better results than slower throughput from a powerful LLM like GPT-5.

🎬 Watch the talk here 👉 https://www.youtube.com/watch?v=sal78ACtGTc
📚 I've added his recommended reads to m-ric/agents-65ba776fbd9e29f771c07d4e

Did anyone research on frameworks or tools that are currently being used to make agents for production. I've been doing some research but most of them not suitable for production.

Join the conversation