Today, let’s talk about a particularly hot concept in the current technology field—AI Agents!

AI Agents: Beyond Large Models and Their Strength

The former world’s richest person wrote on his personal blog:

AI Agent (AI Intelligent Agent/Assistant) “will completely change the way we use computers and disrupt the software industry.”

He also predicted that “Android, iOS, and Windows are platforms; AI Agents will become the next platform.”

A leading figure in the internet industry emphasized at the 2024 World Artificial Intelligence Conference: “AI Agents played a significant role in filling out college entrance examination applications, attracting 2 million users on peak days.”

So what exactly is an AI Agent? What does it have to do with me? Let’s fill in the information gap and clarify what AI Agents are all about.

At the end of the article, there will be a question about AI Agents; can you answer it?

What is an AI Agent

Academia and industry have proposed various definitions for the term “AI Agent.” Among them, OpenAI defines an AI Agent as “a system driven by a large language model as its brain, capable of understanding, perceiving, planning, remembering, and using tools autonomously, and able to automate the execution of complex tasks.”

In simpler terms, most of the time, you give it a final goal you want to achieve, and it can deliver results directly without you having to manage the process.

What is the relationship between AI Agents and LLMs?

So what is the relationship between AI Agents and LLMs (Large Language Models)? It can be simply understood that large models are the premise and foundation for AI Agents.

We can visually compare AI Agents to living organisms and their brains; AI Agents have hands and feet and can work and execute tasks themselves, while LLMs are their brains.

For example, imagine you have an AI chef in your kitchen—an AI Agent.

If you only use a large AI model, it might only output a recipe, telling you what ingredients and steps are needed to make a dish.
However, using an AI Agent, it can not only provide the recipe but also help you choose the most suitable ingredients based on your taste preferences and nutritional needs, even automatically placing orders, monitoring the cooking process, and ensuring the quality and taste of the food, ultimately serving you a dish that is visually appealing and delicious.

Currently, LLMs may have some issues, such as generating hallucinations, not always producing true and reliable results, or having limited knowledge of the latest events, which can make them seem inadequate when handling complex tasks.

However, AI Agents can compensate for these shortcomings by integrating autonomous verification and decision-making processes, ensuring the accuracy and efficiency of actions.

This makes the entire system more reliable and efficient when facing complex tasks, like an experienced chef who not only knows how to make delicious food but can also flexibly adjust based on actual situations to ensure the final result is satisfactory.

How Do AI Agents Work

The architecture of AI Agents is the foundation of their intelligent behavior, typically including key components such as perception, planning, memory, tool usage, and action, which work together to achieve efficient intelligent behavior.

ArchitectureComponents	Function Description
PerceptionSystem	The perception system is the first step for AI Agents to interact with the external world. It captures environmental information through diverse input methods, such as text analysis, image recognition, sound processing, etc.
PlanningSystem	The planning system is the decision-making center of AI Agents. It determines how to achieve established goals based on perceived information. This process requires AI Agents to make decisions, break down complex tasks into executable subtasks, and devise corresponding strategies to achieve complex tasks.
MemorySystem	The memory system is a core component of AI Agents, allowing AI to store and retrieve information, supporting learning and long-term knowledge accumulation. This system enables AI to remember past experiences and apply them to future decisions and actions. Sensory memory is the initial stage of the memory system, responsible for temporarily storing information received through the senses, typically lasting only a very short time. For example, when a user interacts with an AI Agent via voice, sensory memory temporarily saves the heard sound signals. Short-term memory, also known as working memory, is used to store information needed in the current task, but this information is usually not retained after the task is completed. For example, when processing user requests, short-term memory retains the user’s input instructions and related information until the task is completed. Long-term memory is responsible for storing information that needs to be retained for a long time, such as user preferences, historical interactions, etc. In AI systems, long-term memory is typically stored in external databases and accessed by the Agent through fast retrieval mechanisms. For example, if a user frequently asks for weather forecasts in the evening, the AI Agent will learn this preference and automatically provide weather information at the appropriate time.
ToolUsage	Tool usage is the process by which AI Agents utilize external resources or tools to enhance their perception, decision-making, and action capabilities. In this way, AI Agents can extend their capabilities to more effectively complete tasks. For example, on an e-commerce platform, AI Agents use machine learning algorithms to analyze users’ purchase history and browsing habits, intelligently recommending products, enhancing the shopping experience, and improving conversion rates.
ActionSystem	The action system is the specific implementer of AI Agents’ task execution and interaction with the environment. Based on the results of planning, the Agent executes specific actions.

Let’s take a relatable example:

Assuming we have an AI Agent for smart home management named “Xiao Xing,” it works collaboratively in the following manner:

AI Agents: Beyond Large Models and Their Strength

After Xiao Xing executes the above actions, it perceives user feedback. If the user adjusts the light brightness via voice command, Xiao Xing will record this preference and automatically apply this setting in the future.

Combining the above content, let’s summarize:

The workflow of AI Agents is essentially a continuous loop process.

It starts with perceiving the environment, followed by information processing, planning, and decision-making, then executing actions. Finally, it adjusts based on execution results and environmental feedback to optimize future actions and decisions.

Through this structured and hierarchical approach, AI Agents can effectively process information, make decisions, and execute tasks in complex environments.

This architecture not only improves the intelligence level of AI Agents but also enhances their adaptability and flexibility.

What Are the Practical Applications of AI Agents

Next, let’s share two excellent cases of innovative exploration: ChatDev and Stanford’s AI Western Town.

Example 1: ChatDev

AI Agents: Beyond Large Models and Their Strength

Image from the paper “ChatDev: Communicative Agents for Software Development”

ChatDev is an innovative project jointly developed by Tsinghua University, Beijing University of Posts and Telecommunications, and Brown University. It is a software development company with only AI Agent employees, achieving full-process automated software development driven by large models.

On this platform, AI employees autonomously start from user needs, through an intelligent dialogue window, led by the CEO Agent, breaking down tasks and assigning them to various AI Agent roles such as CTO, CPO, Designer, Programmer, Tester, Reviewer, etc.

They will interactively collaborate to produce a complete software solution, including but not limited to source code, environment configuration guides, and user manuals. This process is completed in just a few minutes at a cost of less than one dollar.

Although there are still challenges such as content randomness, insufficient logical correlation, and potential security risks, ChatDev undoubtedly points the way for AI in the software development field.

The future chain for software products will be greatly shortened. What humans need to do is supervise and make decisions, which is exciting to think about~

Example 2: Stanford’s AI Western Town

Image from the paper “ChatDev: Communicative Agents for Software Development”

The virtual Western Town, also known as Smallville, is a research project developed by researchers at Stanford University. This virtual town is an interactive sandbox environment. In this sandbox-like interactive environment, 25 AI Agent residents exhibit remarkable social abilities with their human-like behavior patterns.

Their daily activities include leisurely walking in the park, enjoying afternoon time in cafes, and sharing fresh news with neighbors. More remarkably, they not only remember their daily experiences but can also initiate social activities, such as planning and inviting for Valentine’s Day parties, and coordinating time with each other, etc.~

Little Quiz

What is the best analogy for the relationship between AI Agents and Large Language Models (LLMs)?

A. Car and Engine

B. Computer and Operating System

C. Organism and Its Brain

D. Phone and SIM Card

Source: ZTE Document

Editor: Apo

Reproduced content only represents the author’s views

It does not represent the position of the Institute of Physics, Chinese Academy of Sciences

For reprints, please contact the original public account

Recent Popular Articles Top 10

↓ Click on the title to view ↓

1.If a gun has a range of 1500m, can I catch the bullet with my hand at 1501m?

2.Clearly green beans, how did they boil red soup, is this healthy? | No.416

3.Apples growing “fungus king”, researchers traveled thousands of miles to purchase it, and even tasted it, the result was…

4.Heard that only oily ears can have body odor? Our ancestors really left us many talents

5.Having nightmares may be your immune system attacking your body

6.Installing a mineral water bottle on a fan, is the effect comparable to air conditioning? The truth is…

7.Can Xiaoming really pull himself up if the CPU burns out? | No.419

8.Why is there always a small hole under the airplane window glass? | No.417

9.Smelling stinky socks and feeding pigeons into ICU? Is it all because of this?

10.Is it true that exercise must last more than 30 minutes to lose weight? Is 29 minutes not enough?

Click here to view all past popular articles

AI Agents: Beyond Large Models and Their Strength

1.If a gun has a range of 1500m, can I catch the bullet with my hand at 1501m?

2.Clearly green beans, how did they boil red soup, is this healthy? | No.416

4.Heard that only oily ears can have body odor? Our ancestors really left us many talents

Leave a Comment Cancel reply