What is an AI Agent? Understanding the Components of AI Agents

Click the blue text to follow us

In the previous article, we introduced the interaction methods between users and LLMs, and briefly discussed multi-agent systems (the four modes of user interaction with large models (LLMs) and an introduction to multi-agent systems (MAS)).

This article introduces the components of an AI Agent. Agents are complex units composed of multiple component systems. Based on these components, agents perform tasks and achieve task objectives. The components themselves can be simple or complex systems, typically categorized into the following five major categories.

What is an AI Agent? Understanding the Components of AI Agents

Role Setting

The role setting represents the basic definition of the Agent. The role setting (commonly referred to as the System Prompt) guides the Agent in completing tasks and learning how to respond. Role design typically includes role background and demographic attributes.

🎭 Role Background

Functional positioning (e.g., customer service representative, data analyst, programmer, writer)
Interaction style (formal/friendly, concise/detailed)
Knowledge boundaries (professional fields and authority scope)

🧬 Demographic Attributes

Adding demographic attributes (such as age, gender, cultural background) can enhance the adaptability of the Agent’s interactions and user trust.

Age/Gender: Influences language style
Cultural background: Determines taboo topics and etiquette (e.g., holiday greetings)
Professional background: Provides authority in the field (e.g., “10 years of experience in finance”)

The methods for creating the Agent’s role setting include: manual setting, LLM-assisted generation, or generation based on data or algorithms (e.g., evolutionary algorithms).

Actions and Tool Usage

Agents need to perform a series of actions or use external tools to complete tasks or obtain information.

The actions of agents can be divided into task completion, exploration, and communication. Executing actions will have varying degrees of impact on the environment and internal state of the agent.

🎯 Action Target

The action target clarifies the purpose of the agent’s behavior. Understanding the action target helps us define clear objectives for task completion, exploration, or communication. The representation of action targets includes:

Semantic Function: Natural language instructions driven by LLM (e.g., “query the top five cities in a country”)
Native Function: Low-level operations implemented in code (e.g., file read/write, API calls)

🌌 Action Space

The action space defines the set of all possible actions that the agent can execute in the environment. The action space includes:

Tools: External APIs/physical devices (e.g., database queries);
Internal Knowledge: Built-in rule library of the Agent;
Other Agents: Multi-agent collaboration (e.g., task allocation)

💥 Action Impact

The action impact reveals how actions affect task outcomes, the agent’s environment, and its internal state. Understanding action impact helps in making efficient decisions during task execution. Executing actions may affect the following factors:

Agent environment (e.g., physical state or interaction scenario)
Internal state (e.g., knowledge base updates, emotional simulation)
Other agents: Collaborative network

🤖 Action Generation

There are three types of action generation methods: manual creation, recalling from agent memory, and following predefined plans.

Manual creation: Direct instructions from users;
Recalling from memory: Retrieving historical experiences from long-term memory;
Following predefined plans: Executing a preset sequence of tasks;

Memory and Knowledge

Agents utilize knowledge and memory to supplement the most relevant information for context within the maximum token limit.

📚 Retrieval Organization Structure

Unified Structure: All subsets of knowledge and memory follow a single organizational form (e.g., pure vector index or pure relational database);
Hybrid Structure: Combines multiple storage forms (e.g., “vector database + graph database + time-series logs”), offering flexibility and scalability advantages.

📊 Retrieval Data Forms

Language Documents: Unstructured texts like PDF, HTML, etc., need to be chunked before semantic extraction
Databases: Relational databases, document databases, object databases
Vector Embeddings
Simple Lists (as lightweight memory caches)

🔍 Retrieval Operations

Augmentation: Using retrieval results as context to enhance the agent’s decision-making ability.
Semantic Extraction: Similarity search based on vector embeddings.
Compression: Reducing redundant information to improve retrieval efficiency

Reasoning and Evaluation

Reasoning enables agents to self-reflect, logically decompose tasks, and plan paths, generating internal thoughts. Evaluation quantifies or qualitatively analyzes the task execution process and results, providing a basis for the agent’s self-reflection during task handling and upon task completion.

🧠 Reasoning Techniques

From basic prompts (Zero/One/Few-shot) → Chain of Thought (CoT) → Tree of Thought (ToT) → Skeleton Acceleration (SoT), gradually enhancing the model’s ability to solve complex problems.

Zero-shot prompting: Directly issuing task instructions to the model without providing any examples, relying on the model’s pre-trained knowledge to generate answers
One-shot prompting: Providing one example to guide the model in learning task format and logic
Few-shot prompting: Providing 3-5 diverse examples to reinforce the model’s understanding of complex task patterns
CoT, Chain of Thought: Requiring the model to demonstrate the reasoning process step by step
ToT, Tree of Thought: Simulating human multi-path exploration and backtracking decision-making processes, decomposing problems into tree-like paths and selecting optimal solutions through search algorithms.
SoT, Skeleton of Thought: Mimicking the human cognitive process of “first outlining, then filling in details,” generating the main points of the answer first and then filling in details in parallel

📏 Evaluation Techniques

Self-consistency enhances reliability through group decision-making, while Prompt Chaining ensures task controllability through process decomposition.

Self-consistency: Generating multiple reasoning paths and voting to select the most consistent answer.
Prompt Chaining: Decomposing complex tasks into an ordered chain of subtasks, using previous outputs as subsequent inputs. Tracing back errors through intermediate results.

Planning and Feedback

🤖 No Feedback Planning (Autonomous)

Agents independently execute tasks based on preset rules, without relying on external feedback for dynamic adjustments in planning.

Basic Planning: Static task decomposition strategies (e.g., Gantt charts, task lists), without complex reasoning or tool invocation
Automated Tool Invocation Reasoning: Automatically generating tool invocation programs through LLM, enabling code execution/API calls
Sequential Planning: Linearly executing subtasks (e.g., critical path method), without parallel or dynamic path optimization

🔄 Feedback Planning

An adaptive mechanism that dynamically corrects plans based on environmental/human/model feedback.

Environmental Feedback: Real-time status signals returned by sensors/APIs (e.g., changes in road conditions in autonomous driving)
Human Feedback: Corrective instructions or ratings provided by users (e.g., doctors adjusting AI diagnostic results)
LLM Feedback: The model generates optimization suggestions through self-consistency checks or reflection chains
Adaptive Constructive Feedback: Improvement mechanisms dynamically adjusted based on error types (e.g., in-plan refinement, skill discovery)

The above is an introduction to the components of AI Agents. This article serves as a reading note for “AI Agents in Action.”

Feel free to follow me for more content related to AI Agents.

Role Setting

Actions and Tool Usage

Memory and Knowledge

Reasoning and Evaluation

Planning and Feedback

Related posts

Leave a Comment Cancel reply