Key Directions for AI Agents in the Era of Large Models

This article is approximately 3200 words long and is recommended for an 8-minute read. This article will outline the key points related to the large language model (LLM) Agent and discuss the important directions for AI Agents in the era of large models.

As large language models mature, various AI Agents based on them are gradually coming into people’s sight.

Today’s introduction will focus on the following five points:

1. Overall architecture of LLM-based Agents

2. Key & challenging issues of LLM-based Agents

3. User behavior simulation agents based on large language models

4. Multi-agent software development based on large language models

5. Future directions of LLM-based Agents

01 Overall Architecture of LLM-based Agents

The composition of large language model Agents mainly consists of the following four modules:

1. Profile Module: Describes the background information of the Agent

Below is an introduction to the main content and generation strategies of the profile module.

(1) Profile content is mainly based on three types of information: demographic information, personality information, and social information.

(2) Generation strategies: mainly three strategies are used to generate profile content:

Manual design method: Write the content of the user profile into the large model’s prompt in a specified manner; suitable for a smaller number of Agents;
Large model generation method: First specify a small number of profiles as examples, then use the large language model to generate more profiles; suitable for a large number of Agents;
Data alignment method: Use the background information of individuals in a pre-specified dataset as prompts for the large language model to make corresponding predictions.

2. Memory Module: Primarily records Agent behavior and supports future Agent decision-making

(1) Memory Structure

Unified memory: Only considers short-term memory, ignoring long-term memory;
Hybrid memory: Combines long-term and short-term memory

(2) Memory Forms: Primarily based on the following four forms

Language
Database
Vector representation
List

(3) Memory Content: Commonly includes the following three operations

Memory reading
Memory writing
Memory reflection

3. Planning Module

Feedback-free planning: The large language model does not require external environmental feedback during reasoning. This type of planning is further subdivided into three types: single-path reasoning, which uses the large language model once to fully output reasoning steps; multi-path reasoning, which leverages crowdsourcing ideas to generate multiple reasoning paths and determine the best path; and external planner borrowing.
Feedback-based planning: This planning method requires external environmental feedback, and the large language model needs to base its next steps and subsequent planning on the feedback from the environment. Feedback for this type of planning comes from three sources: environmental feedback, human feedback, and model feedback.

4. Action Module

Action goals: Some Agents aim to complete a specific task, some aim to communicate, and some aim to explore.
Action generation: Some Agents generate actions based on memory recall, while others execute specific actions according to the original plan.
Action space: Some action spaces are collections of tools, while others consider the entire action space from the perspective of the large language model’s own knowledge and self-awareness.
Action impact: Includes impact on the environment, impact on internal states, and impact on future new actions.

The above is the overall framework of Agents; more content can be referenced in the following paper:

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen: A Survey on Large Language Model based Autonomous Agents. CoRR abs/2308.11432 (2023)

02 Key & Challenging Issues of LLM-based Agents

The current key and challenging issues of large language model Agents mainly include:

1. How to enhance the role-playing ability of Agents

The most important function of an Agent is to complete specific tasks or various simulations by playing a certain role, so the role-playing ability of the Agent is crucial.

(1) Definition of Agent role-playing ability

Agent role-playing ability is divided into two dimensions:

Relationship between role and Agent behavior
Mechanism of role evolution in the environment

(2) Evaluation of Agent role-playing ability

After defining role-playing ability, the next step is to evaluate the Agent’s role-playing ability from the following two aspects:

Role-playing evaluation metrics
Role-playing evaluation scenarios

(3) Enhancement of Agent role-playing ability

Based on the evaluation, further enhancement of the Agent’s role-playing ability can be achieved through the following two methods:

Enhancing role-playing ability through prompts: This method essentially stimulates the existing capabilities of the large language model by designing prompts;
Enhancing role-playing ability through fine-tuning: This method usually fine-tunes the large language model based on external data to enhance its role-playing ability.

2. How to design the Agent’s memory mechanism

The biggest difference between Agents and large language models is that Agents can continuously evolve and learn in the environment; and the memory mechanism plays a very important role in this. Analyze the Agent’s memory mechanism from three dimensions:

(1) Design of Agent memory mechanism

There are commonly two types of memory mechanisms:

Memory mechanism based on vector retrieval
Memory mechanism based on LLM summarization

(2) Evaluation of Agent memory capability

Evaluating the Agent’s memory capability mainly needs to determine the following two points:

Evaluation metrics
Evaluation scenarios

(3) Evolution of Agent memory mechanism

Finally, the evolution of the Agent’s memory mechanism needs to be analyzed, including:

Evolution of memory mechanisms
Autonomous updates of memory mechanisms

3. How to enhance the reasoning/planning capability of Agents

(1) Task decomposition capability of Agents

Definition and breakdown of sub-tasks
Optimal sequence for task execution

(2) Integration of Agent reasoning and external feedback

Design mechanisms for integrating external feedback during reasoning: forming an interactive whole between the Agent and the environment;
Enhancing the Agent’s responsiveness to external feedback: on one hand, the Agent needs to genuinely respond to the external environment, and on the other hand, it needs to be able to ask questions about the external environment and seek solutions.

4. How to design efficient collaboration mechanisms for multiple Agents

(1) Cooperation mechanisms for multiple Agents

Definition of different roles among Agents
Design of cooperation mechanisms among Agents

(2) Debate mechanisms for multiple Agents

Design of debate mechanisms among Agents
Determination of convergence conditions for Agent debates

03 User Behavior Simulation Agents Based on Large Language Models

Next, I will give a few practical examples of Agents. The first is a user behavior simulation agent based on large language models. This agent is also an early work that combines large language model agents with user behavior analysis. In this work, each Agent is divided into three modules:

1. Profile Module

Different attributes are assigned to different Agents, such as ID, name, occupation, age, interests, and characteristics.

2. Memory Module

The memory module includes three sub-modules.

(1) Sensory memory

(2) Short-term memory

Processes the objectively observed raw observations to generate more informative observations, which are stored in short-term memory;
Short-term memory content has a relatively short storage time

(3) Long-term memory

Content from short-term memory that is repeatedly triggered and activated will automatically transfer to long-term memory
Long-term memory content has a relatively long storage time
Long-term memory content will autonomously reflect and refine based on existing memories.

3. Action Module

Each Agent can perform three types of actions:

Agent behavior in recommendation systems, including watching movies, searching for the next page, and leaving the recommendation system;
Dialogue behavior among Agents;
Agent behavior in posting on social media.

Throughout the simulation process, an Agent can freely select among the three actions in each round without external interference; we can observe that different Agents engage in dialogue with each other and autonomously generate various behaviors in social media or recommendation systems; after multiple rounds of simulation, interesting social phenomena and patterns of user behavior on the internet can be observed.

More content can be referenced in the following paper:

Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen: When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm

04 Multi-Agent Software Development Based on Large Language Models

The next example of an Agent is using multiple Agents for software development. This work is also one of the early works of multi-Agent collaboration, with the primary goal of utilizing different Agents to develop a complete software. Therefore, it can be viewed as a software company, where different Agents play different roles: some Agents are responsible for design, including roles like CEO, CTO, CPO, etc.; some Agents are responsible for coding, while others mainly handle testing; additionally, there will be some Agents responsible for writing documentation. In this way, different Agents are responsible for different tasks; finally, the cooperation mechanism among Agents is coordinated and updated through communication, ultimately completing a complete software development process.

05 Future Directions of LLM-based Agents

Currently, Agents based on large language models can be divided into two major directions:

Solving specific tasks, such as MetaGPT, ChatDev, Ghost, DESP, etc.

This type of Agent should ultimately be a “superhuman” aligned with correct human values, with two “qualifiers”:

Alignment with correct human values;

Capabilities that surpass ordinary humans.
Simulating the real world, such as Generative Agent, Social Simulation, RecAgent, etc.

The capabilities required for this type of Agent are entirely opposite to the first type.

Allowing Agents to present diverse values;

Hoping that Agents align more closely with ordinary people rather than surpassing them.

Additionally, there are currently two pain points for large language model Agents:

Illusion problem

Due to the need for Agents to constantly interact with the environment, the illusion at each step will accumulate, leading to a cumulative effect that exacerbates the problem; thus, the illusion problem of large models needs further attention here. Solutions include:

Designing efficient human-machine collaboration frameworks;

Designing efficient human intervention mechanisms.
Efficiency issue

During the simulation process, efficiency is a very important issue; the following table summarizes the time taken by different Agents under different numbers of APIs.

Guest Speaker

Dr. Chen Xu,Renmin University of China,Associate Professor,Ph.D. graduated from Tsinghua University, joined Renmin University in 2020. His research focuses on recommendation systems, reinforcement learning, causal inference, etc. He has published over 60 papers in renowned international conferences/journals such as TheWebConf, AIJ, TKDE, SIGIR, WSDM, TOIS. He co-led the construction of the recommendation system toolkit “Bole”, the explainable recommendation dataset REASONER, and the simulation environment RecAgent for user autonomy based on large language models. His research achievements have received various awards, including the Best Paper Nomination at TheWebConf 2018, the Runner Up Award for Best Resource Paper at CIKM 2022, and the Best Paper Award at AIRS 2017. He has also won the CCF Natural Science Second Prize (second place), ACM-Beijing Rising Star Award (one of three in Beijing), etc. His research outcomes have been implemented in several enterprises, and related achievements have won Huawei’s “Innovation Pioneer” President’s Award. He has led/participated in multiple National Natural Science Foundation projects and corporate collaboration projects..

Editor: Wang Jing

Related posts

Leave a Comment Cancel reply