The Rise and Potential of Large Language Model Agents

AI Agents have become a hot topic recently. Lilian Weng, the research director of OpenAI, published a lengthy article about AI Agents: “Autonomous Agents Supported by Large Language Models (LLMs)” which sparked industry discussions. In her article, she clearly defines the application framework for building AI Agents based on LLMs: Agent = LLM (Large Language Model) + Memory + Planning Skills + Tool Use, where the LLM serves as the brain of the agent, while the other components are crucial elements. Recently, the Natural Language Processing group at Fudan University published a review paper titled “Large Model Agents” that proposes a conceptual framework for LLM-based agents, consisting of three main components: Brain, Perception, and Action, which is very noteworthy!

https://www.zhuanzhi.ai/paper/cd77bba118be3a41aa389b99437d8988

For a long time, humans have pursued artificial intelligence (AI) that is on par with or surpasses human levels, believing that AI agents are a promising tool to achieve this goal. AI agents are artificial entities that can perceive their environment, make decisions, and take actions. Since the mid-20th century, many efforts have been made to develop intelligent AI agents. However, these efforts have primarily focused on advances in algorithms or training strategies to improve specific capabilities or performance on specific tasks. In reality, what this field lacks is a sufficiently general and powerful model that serves as a starting point for designing AI agents that can adapt to diverse scenarios. Due to their demonstrated versatility and exceptional capabilities, Large Language Models (LLMs) are seen as a potential medium for General Artificial Intelligence (AGI), providing hope for the construction of general AI agents. Many research efforts have been made to build AI agents based on LLMs, achieving significant progress. We first trace the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable as the foundation for AI agents. Based on this, we propose a conceptual framework for LLM-based agents, consisting of three main components: Brain, Perception, and Action, which can be customized for different applications. Subsequently, we explore the widespread applications of LLM-based agents in three areas: single-agent scenarios, multi-agent scenarios, and human-agent collaboration. Next, we delve into the social aspects of agents, discussing the behaviors and personalities of LLM-based agents, the social phenomena that arise when they form societies, and the insights they provide for human society. Finally, we discuss a series of key themes and unresolved issues in this field.

Artificial Intelligence (AI) is a field dedicated to designing and developing systems that can replicate human intelligence and capabilities [1]. As early as the 18th century, philosopher Denis Diderot proposed the idea that if a parrot could respond to all questions, it could be considered intelligent [2]. Although Diderot referred to biological entities like parrots, his idea emphasizes a profound concept that highly intelligent beings may resemble human intelligence. By the 1950s, Alan Turing expanded this concept to artificial entities and proposed the famous Turing Test [3]. This test is a cornerstone of AI, aimed at exploring whether machines can exhibit intelligent behavior comparable to humans. These AI entities are commonly referred to as “agents,” which are foundational components of AI systems. In AI, agents typically refer to artificial entities that can perceive their environment using sensors, make decisions, and then respond using actuators [1; 4].

The concept of agents originated in philosophy, tracing back to thinkers like Aristotle and Hume [5]. It describes entities that possess desires, beliefs, intentions, and the ability to take action [5]. This idea transitioned into computer science, aiming to enable computers to understand user interests and autonomously take actions on their behalf [6; 7; 8]. As AI developed, the term “agent” found its place in AI research, used to describe entities that exhibit intelligent behavior and possess traits such as autonomy, reactivity, proactivity, and social capability [4; 9]. Since then, the exploration of agents and technological advancements have become a focal point for the AI community [1; 10]. Today, AI agents are considered a crucial step toward achieving General Artificial Intelligence (AGI), as they encompass the potential for a wide range of intelligent activities [4; 11; 12].

Since the mid-20th century, significant progress has been made in developing intelligent AI agents as research delves deeper into their design and enhancement [13; 14; 15; 16; 17; 18]. However, these efforts have primarily focused on improving specific capabilities, such as symbolic reasoning, or mastering specific tasks, such as Go or chess [19; 20; 21]. Achieving broad adaptability across different scenarios remains elusive. Furthermore, previous research has emphasized the design of algorithms and training strategies while neglecting the development of the model’s inherent general capabilities, such as knowledge memory, long-term planning, effective generalization, and effective interaction [22; 23]. In fact, enhancing the inherent capabilities of models is a key factor for the further development of agents, and the field requires a robust foundational model with the aforementioned various key attributes as a starting point for agent systems.

The development of Large Language Models (LLMs) brings a glimmer of hope for the further advancement of agents [24; 25; 26], and the community has made significant progress [22; 27; 28; 29]. According to the concept of “World Scope (WS)” [30], which encompasses five levels from Natural Language Processing (NLP) to General AI (i.e., corpus, internet, perception, concrete, and social), pure LLMs are built on the second level, with text input and output at an internet scale. Nevertheless, LLMs have demonstrated strong capabilities in knowledge acquisition, instruction understanding, generalization, planning, and reasoning, while effectively interacting with humans in natural language. These advantages bestow LLMs with the title of a medium for General Artificial Intelligence (AGI) [31], making them highly suitable for constructing agents to facilitate a harmonious coexistence between humans and agents [22]. From this point, if we elevate LLMs to the status of agents and provide them with a broader perception and action space, they have the potential to reach the third and fourth levels of WS. Furthermore, these LLM-based agents can tackle more complex tasks through collaboration or competition, and new social phenomena can be observed when they are placed together, potentially reaching the fifth WS level. As shown in Figure 1, we envision a harmonious society composed of AI agents, where humans can also participate.

In this article, we provide a comprehensive and systematic survey focusing on LLM-based agents, attempting to explore the existing research and prospective pathways in this emerging field. To this end, we first delve into key background information (Section 2). Specifically, we trace the origins of AI agents from philosophy to the field of AI and briefly outline the debates surrounding the existence of artificial agents (Section 2.1). Next, we provide a concise historical overview of the development of AI agents from the perspective of technological trends (Section 2.2). Finally, we introduce the basic characteristics of agents and clarify why Large Language Models are particularly suitable as the brain or controller of AI agents (Section 2.3).

Inspired by the definition of “agents,” we propose a general conceptual framework for LLM-based agents consisting of three key parts: brain, perception, and action (Section 3), which can be customized to fit different application scenarios. We first introduce the brain, primarily composed of a large language model (Section 3.1). Similar to humans, the brain is the core of the AI agent, as it not only stores key memories, information, and knowledge but also performs essential tasks such as information processing, decision-making, reasoning, and planning. This is a key determinant of whether an agent can exhibit intelligent behavior. Next, we introduce the perception module (Section 3.2). For the agent, this module functions similarly to human sensory organs. Its primary function is to expand the agent’s perception space from being limited to text to a multimodal space that includes various sensory modalities such as text, sound, vision, touch, and smell. This expansion allows the agent to better acquire information from the external environment. Finally, we introduce the action module (Section 3.3), which is designed to expand the agent’s action space. Specifically, we hope that agents can have text outputs, perform embodied actions, and use tools, enabling them to better respond to changes in the environment, provide feedback, and even alter and shape the environment.

We then provide a detailed and comprehensive introduction to the practical applications of LLM-based agents and clarify the foundational design pursuit—”Harnessing AI for the Public Good” (Section 4). First, we delve into the current applications of single agents and discuss their performance in text-based tasks and simulated exploration environments, highlighting their capabilities in handling specific tasks, driving innovation, and demonstrating human-like survival skills and adaptability (Section 4.1). Next, we review the historical development of multi-agent systems. We introduce the interactions between agents in LLM-based multi-agent systems, where they engage in cooperation, negotiation, or competition. Regardless of the interaction mode, agents work together to achieve shared goals (Section 4.2). Finally, considering the potential limitations of LLM-based agents in privacy security, ethical constraints, and data scarcity, we discuss human-agent collaboration. We summarize examples of cooperation between agents and humans: the guide-executor model and the equal collaboration model, along with specific practical applications (Section 4.3).

Based on the exploration of LLM-based agents in practical applications, we now turn our focus to the concept of “agent society,” studying the complex interactions between agents and their surrounding environment (Section 5). This section first investigates whether these agents exhibit human-like behaviors and possess corresponding personalities (Section 5.1). Additionally, we introduce the social environments in which agents operate, including text-based environments, virtual sandboxes, and the physical world (Section 5.2). Unlike the previous section (Section 3.2), here we focus on various types of environments rather than how agents perceive them. After establishing the foundation of agents and their environments, we continue to reveal the simulated societies they form (Section 5.3). We will discuss the construction of simulated societies and further explore the social phenomena that arise within them. Specifically, we will highlight the inherent lessons and potential risks within simulated societies.

Finally, we discuss a series of key themes and unresolved issues in the field of LLM-based agents (Section 6): (1) The mutual benefits and inspirations of LLM research and agent research, where we demonstrate that the development of LLM-based agents provides many opportunities for both the agent and LLM communities (Section 6.1); (2) Existing evaluation work and some prospects for LLM-based agents from four dimensions, including practicality, sociality, values, and the ability for continuous evolution (Section 6.2); (3) Potential risks of LLM-based agents; we discuss the adversarial robustness and credibility of LLM-based agents. We also include discussions on other risks such as misuse, unemployment, and threats to human well-being (Section 6.3); (4) Scaling the number of agents; we discuss the potential advantages and challenges of scaling the number of agents, as well as static and dynamic scaling methods (Section 6.4); (5) Several unresolved issues, such as the debate on whether LLM-based agents represent a potential pathway to AGI (Artificial General Intelligence), challenges from virtual simulation environments to physical environments, collective intelligence in AI agents, and agents as a service (Section 6.5). In summary, we hope this article inspires researchers and practitioners in related fields.

The Birth of Agents: Building Agents Based on Large Language Models (LLMs)

“Survival of the fittest” [131] suggests that if an individual wants to survive in the external environment, they must effectively adapt to their surroundings. This requires them to have cognitive abilities to perceive and respond to changes in the external world, which aligns with the definition of “agents” mentioned in Section 2.1. Inspired by this, we propose a general conceptual framework for LLM-based agents, consisting of three key parts: brain, perception, and action (see Figure 2). We first describe the structure and working mechanism of the brain, which is primarily composed of a large language model (Section 3.1). The brain is the core of the AI agent, as it not only stores knowledge and memories but also performs indispensable functions such as information processing and decision-making. It can present reasoning and planning processes and can effectively handle unseen tasks, showcasing the intelligence of the agent. Next, we introduce the perception module (Section 3.2). Its core purpose is to expand the agent’s perception space from being limited to text to a multimodal range that includes text, auditory, and visual modalities. This expansion enables the agent to more effectively grasp and utilize information from its surrounding environment. Finally, we introduce the action module (Section 3.3), which is designed to extend the agent’s action space. Specifically, we empower agents with concrete action capabilities and tool usage skills, enabling them to adeptly adapt to environmental changes, provide feedback, and even influence and shape the environment.

This framework can be customized for different application scenarios, meaning not every specific component will be used in all research. Generally speaking, agents operate according to the following workflow: first, the perception module, akin to human sensory systems such as eyes and ears, perceives changes in the external environment and then converts the multimodal information into a representation that the agent can understand. Subsequently, the brain module, serving as the control center, engages in information processing activities, such as thinking, decision-making, and operations related to stored knowledge (including memory and knowledge). Finally, the action module, corresponding to human limbs, executes actions with the assistance of tools, affecting the surrounding environment. By repeating this process, agents can continuously acquire feedback and interact with their environment.

Convenient Viewing with ZHUANZHI

Convenient Download, please followZHUANZHI public account (click the blue above to follow)

Reply or send a message “A86” in the background to get the download link for “AI Agent’s Next Hotspot? The Latest 86-Page Review from Fudan University on the Rise and Potential of Large Language Model Agents, Detailing LLM Agent: Brain, Perception, and Action” from ZHUANZHI

The Rise and Potential of Large Language Model Agents

ZHUANZHI, Professional and Reliable AI Knowledge Distribution, making cognitive collaboration faster and better! Welcome to register and log in to ZHUANZHI www.zhuanzhi.ai to obtain 100000+ AI (AI in military, medicine, public security, etc.) themed knowledge materials!

Scan the QR code to join the ZHUANZHI AI Knowledge Group to get latest AI professional knowledge tutorials and communicate with experts!

Click “Read the Original” to learn how to useZHUANZHI and view over100000+ AI themed knowledge materials

Related posts

Leave a Comment Cancel reply