Post
718
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],
Here are some of their key findings:
They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:
- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static
An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.
Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.
The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory
The agent must simulate or predict the future states of the environment for planning and decision-making.
ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.
LLM world models are mostly implicit and embedded.
EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.
Agents must understand emotions to better interact with us.
But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.
Perception is the process by which an agent receives and interprets raw data from its surroundings.
READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)
Here are some of their key findings:
They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:
- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static
An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.
Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.
The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory
The agent must simulate or predict the future states of the environment for planning and decision-making.
ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.
LLM world models are mostly implicit and embedded.
EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.
Agents must understand emotions to better interact with us.
But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.
Perception is the process by which an agent receives and interprets raw data from its surroundings.
READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)