Book Giveaway at the End

AI Agent has recently become extremely popular.

Sam Altman once described the future of AI Agents: “Today’s AI models are the ‘dumbest’ they will ever be; they will only become smarter in the future!” Andrew Ng also praised at the AI Ascent 2024 conference: “This is the golden age of AI development, where generative AI and AI Agents will fundamentally change the way we work!” The words of these industry leaders make one feel that the deep integration of AI with human life and work is accelerating.

The reason why an Agent can become an Agent is that it has the ability to make judgments, thus enabling it to make decisions.

So where does its judgment ability come from? It is derived from the wisdom distilled from vast amounts of human language. Our natural corpus specifies that in scenario A, things should be done this way, while in scenario B, they should be done that way. Therefore, large language models can also make judgments. This is the foundation for large language models to become Agents.

A cute Agent

However, while the vision is good, the reality is not ideal. An article from the WeChat account “Product Sister” pointed out that Agent developers candidly admit that everyone is navigating through a dilemma. The current state of Agent development lacks systematic theoretical guidance and excellent examples to inspire and emulate.

So, in the era of large model application development, how should the Agents that everyone is discussing be designed and implemented? The author of this article, best-selling author “Hands-on AI Agent” and “GPT Illustrated”, Huang Jia, will comprehensively analyze the design understanding and implementation methods of 7 major design patterns in a practical manner.

4 Design Patterns of Agent Cognitive Frameworks

Professor Andrew Ng discussed his four classifications of AI Agent cognitive framework design patterns at the Sequoia Capital AI Summit (AI Ascent), including Reflection, Tool Use, Planning, and Multi-Agent Collaboration.

Professor Andrew Ng proposed four design patterns for the Agent cognitive framework.

The four basic thinking framework design patterns are:

· Reflection: The Agent optimizes decisions through interaction learning and reflection.

· Tool Use: In this mode, the Agent can invoke multiple tools to complete tasks.

· Planning: In the planning mode, the Agent needs to plan a series of action steps to achieve its goals.

· Multi-Agent Collaboration: Involves collaboration among multiple Agents.

Technical Architecture of Agent Cognitive Framework

Lilian Weng, the safety system director at OpenAI, also proposed an architecture for an autonomous Agent system driven by large models, which includes four key elements: Planning, Memory, Tools, and Action.

AI Agent: Comprehensive Analysis of 7 Cognitive Frameworks and Code Implementation

In this architecture, the Agent is positioned at the center, coordinating various components to handle complex tasks and decision-making processes.

Planning: The Agent needs to possess planning (which also includes decision-making) capabilities to effectively execute complex tasks. This involves subgoal decomposition, chain of thoughts, self-criticism, and reflection on past actions.
Memory: This includes both short-term and long-term memory. Short-term memory relates to context learning, part of prompt engineering, while long-term memory involves the long-term retention and retrieval of information, usually through external vector storage and quick retrieval.
Tools: This includes various tools that the Agent may invoke, such as calendars, calculators, code interpreters, and search functions, as well as other possible tools. Since the internal capabilities and knowledge boundaries of large models are generally fixed once pre-training is completed, and difficult to expand, these tools become exceptionally important. They extend the Agent’s capabilities, allowing it to perform tasks beyond its core functions.
Action: The Agent executes specific actions based on planning and memory. This may include interacting with the external world or completing an action (task) through the invocation of tools.

A series of Agent cognitive frameworks have begun to take shape around this architecture. Next, we will focus on several representative Agent cognitive framework design patterns and their implementation ideas.

7 Specific Implementations of Agent Cognitive Frameworks

Next, we will discuss the basic ideas of the mainstream 7 Agent cognitive frameworks and briefly explain how to implement these frameworks.

Framework 1: Chain of Thought

In the field of Agent cognitive frameworks, the pioneering work is the Chain-of-Thought Prompting paper, which appeared even before ChatGPT.

In an era when the inference capabilities of large models were generally weak, this paper introduced a coherent thought process to guide models to perform deeper logical reasoning, greatly enhancing their ability to handle complex problems. This method not only optimized the model’s reasoning process but also improved the interpretability of outputs, making the model’s decision-making process more transparent and understandable to users.

In the paper, Chain of Thought (CoT) refers to a series of logical thinking steps formed during problem-solving. In the AI field, especially in natural language processing and machine understanding tasks, the CoT method enhances the model’s understanding and reasoning abilities by simulating human thought processes. By explicitly presenting the logical steps to solve a problem, CoT helps to enhance the model’s transparency and interpretability.

Example of Chain of Thought from the paper

In fact, when designing an Agent, we need to refer to the essence of the methodology in its philosophy. Once we comprehend the ideas, we can implement the CoT framework through carefully designed prompt engineering templates.

For example, suppose we need to establish a model to assess an individual’s credit rating. This is a typical financial service scenario involving multiple variables and logical judgments.

Under the CoT framework, we can design the following prompt to help the AI model assess credit ratings through logical reasoning:

Considering the following information about the applicant:

– Age: 35 years

– Annual Income: $50,000

– Credit History: No defaults

– Debt: $10,000 credit card debt

– Assets: No real estate, one car worth $15,000

Step 1: Assess credit history. The applicant has no default records, which is a positive credit factor.

Step 2: Consider the debt-to-income ratio. The applicant’s annual income is $50,000, while the debt is $10,000, resulting in a debt-to-income ratio of 20%, indicating that the applicant has sufficient income to cover the debt.

Step 3: Consider asset situation. Although the applicant has no real estate, they have a car that can serve as collateral for a loan.

Step 4: Based on the above analysis, comprehensively assess the applicant’s credit rating. Final judgment: Based on the above logical reasoning, the applicant’s credit rating should be above average.

Here is an example of using the OpenAI API to invoke the CoT framework, with Python programming. This code sends a detailed question description containing logical reasoning steps to the model, thereby obtaining decision reasoning about personal credit assessment.

from openai import OpenAIclient = OpenAI()completion = client.chat.completions.create(  model="gpt-4",  # Use the GPT-4 model  messages=[    {"role": "system", "content": "You are an intelligent assistant specializing in credit assessment, capable of analyzing the applicant's credit status through logical reasoning."},    {"role": "user", "content": """        Consider the following information about the applicant: - Age: 35 years        - Annual Income: 50,000 dollars        - Credit History: No defaults        - Debt: 10,000 dollars credit card debt        - Assets: No real estate        Analyze the steps as follows: 1. Assess the default risk based on age, income, and credit history. 2. Consider the debt-to-income ratio to determine if the debt level is reasonable. 3. Assess the risk factors of not having real estate and whether it will affect the applicant's ability to repay. Based on the above analysis steps, please assess the applicant's credit rating."""}  ])print(completion.choices[0].message)

Here, we constructed a detailed prompt that guides the model along the set thought chain for logical reasoning. This approach not only helps in generating more interpretable answers but also improves the accuracy of decisions.

The CoT paper has sparked a wave of research, and subsequent studies on large model reasoning cognition have gradually enriched.

Framework 2: Self-Ask

Following the idea of Chain-of-Thought, Self-Ask is an extension of CoT.

Self-Ask allows the model to self-generate questions, conduct self-querying to obtain more information, and then combine this information to generate the final answer. This method enables the model to explore various aspects of a problem more deeply, thereby improving the quality and accuracy of answers.

Example from the Self-Ask paper

The Self-Ask cognitive framework is very useful in applications that require in-depth analysis or creative solutions, such as creative writing or complex queries.

Suppose we are designing a new smartwatch and need to consider diverse user needs and technological possibilities. We can set up the following Self-Ask framework.

from openai import OpenAIclient = OpenAI()completion = client.chat.completions.create(  model="gpt-4",  # Use the GPT-4 model  messages=[    {"role": "system", "content": "You are an intelligent assistant specializing in product design innovation, capable of self-generating questions to explore innovative design solutions."},    {"role": "user", "content": "We are developing a new smartwatch. Please analyze the main functions of smartwatches currently on the market and propose possible innovations."},    {"role": "system", "content": "First, consider what features are generally lacking in smartwatches on the current market?"},    {"role": "system", "content": "Next, explore what new features may attract health-conscious consumers?"},    {"role": "system", "content": "Finally, analyze the technically feasible innovative features and how they can be realized through wearable technology?"}  ])print(completion.choices[0].message)

In this example, by generating questions and answers through the system, it not only guides in-depth market and technical analysis but also stimulates thinking about potential innovation points. This method helps to identify and integrate innovative elements at the early stages of product design.

You can use a few-shot approach with this template to allow the large model to engage in more creative thinking, often sparking ideas we might not have initially considered.

Framework 3: Critique Revise or Reflection

The Critique Revise cognitive framework, also known as Self-Reflection, is a framework applied in artificial intelligence and machine learning, primarily used to simulate and implement complex decision-making processes. This architecture is based on two core steps: “Critique” and “Revise,” which iteratively improve the system’s performance and decision quality.

· Critique: In this step, the system evaluates the current decisions or outputs and identifies problems or shortcomings. This process typically involves comparing with predefined goals or standards to determine the gap between current outputs and expected results.

· Revise: Based on the issues identified in the critique step, the system adjusts its decision-making process or behavior strategies to improve output quality. Revisions can involve adjusting existing algorithm parameters or adopting entirely new strategies or methods.

The goal of the Critique Revise cognitive architecture is to enable the system to learn and improve its decision-making processes through continuous self-evaluation and adjustment, thus making more effective decisions when facing complex problems.

Suppose a company is evaluating the effectiveness of its recent digital marketing campaign to formulate future marketing strategies. Using the Critique Revise framework, the decision-making process can be optimized through the following steps:

· Critique: The system first analyzes the data from the existing marketing campaign, including ad click-through rates, conversion rates, consumer interactions, etc., and compares them with established goals or industry standards. At this stage, the system identifies shortcomings in the current strategy, such as inaccurate target audience positioning, unappealing ad content, or unreasonable budget allocation.

· Revise: Based on the analysis results from the critique phase, the system proposes improvement measures. This may include adjusting target audiences, redesigning ad content, or optimizing budget allocation strategies. Additionally, the system may recommend testing new marketing channels or technologies to enhance overall marketing effectiveness.

The code implementation is as follows:

from openai import OpenAIclient = OpenAI()# Execute Critique phasecritique_completion = client.chat.completions.create(  model="gpt-4",  messages=[    {"role": "system", "content": "You are a marketing analysis assistant."},    {"role": "user", "content": "Analyze the effectiveness of the recent marketing campaign and identify existing problems."}  ])# Execute Revise phaserevise_completion = client.chat.completions.create(  model="gpt-4",  messages=[    {"role": "system", "content": "You are a marketing strategy optimization assistant."},    {"role": "user", "content": "Based on the previous critique, propose specific improvement measures."}  ])print("Critique Results:", critique_completion.choices[0].message)print("Revise Suggestions:", revise_completion.choices[0].message)

In this process, the AI system first executes the Critique phase to analyze and identify problems; then in the Revise phase, it proposes specific improvement measures based on the identified issues. This method helps businesses gain a deeper understanding of market dynamics and accurately adjust marketing strategies to achieve more effective market responses.

Framework 4: Function Calling/Tool Calls

Function Calling is a framework proposed by OpenAI for AI application development. In this architecture, large language models are used as engines to invoke predefined functions, responsible for determining which tools to automatically invoke based on user needs.

The tools supported in OpenAI Assistant include code interpreters (data analysis), function calls, file retrieval tools, etc. This approach is particularly suitable for applications that require integration with existing systems or performing specific technical tasks, such as automation scripts or data analysis.

Various tools in OpenAI Assistant

Of course, the tool calling framework is not exclusive to the OpenAI API; the LangChain Agent also integrates a large number of available tools.

Tools and toolboxes in LangChain

Regarding the tool invocation of Agents, the current common large model development frameworks have already formed very complete solutions.

Framework 5: ReAct (Reasoning, Planning, and Acting)

With the gradual reasoning of CoT, the reflection of Refection, and the tool calls, we finally arrive at the culmination of Agent cognitive frameworks—the ReAct framework.

This framework integrates the previous CoT and Reflection methods and introduces tool calling functions, further enhancing the model’s interactivity and application range, representing a new milestone in the development of Agent cognitive frameworks.

The ReAct paper points out that it requires both reasoning and action.

The ReAct framework is an integration of reasoning and action. The core idea of the ReAct framework is to iterate through thinking, observing, and acting repeatedly, continuously optimizing solutions until the problem is finally resolved. This allows the Agent not only to perform complex internal reasoning but also to respond in real-time and adjust its behavior to adapt to changing environments and needs.

Currently, the ReAct framework has been seamlessly integrated into LangChain, allowing developers to easily create ReAct Agents to accomplish specific tasks.

Framework 6: Plan-and-Execute

Plan-and-Execute architecture focuses on planning a series of actions before execution. It enables LLMs to comprehensively consider multiple aspects of a task before acting according to the plan. This is especially effective in complex project management or scenarios requiring multi-step decision-making, such as automated workflow management.

Example from the Plan-and-Solve paper

Currently, the LangChain Experiment package supports the Plan-and-Execute framework, allowing developers to create Plan-and-Execute Agents that plan tasks before execution.

Implementation example of Plan-and-Solve

Framework 7: Multi-Agent Collaboration

Multi-Agent systems are indeed a new research hotspot. This type of research focuses on how to enable multiple Agents to work together to achieve complex tasks and goals. This includes studies on cooperation, competition, and negotiation strategies.

Representative works of such multi-Agent collaboration frameworks are AutoGen and MetaGPT.

The Agent customization feature in the AutoGen framework allows developers to customize Agents to achieve different functionalities.

The MetaGPT framework combines standard operating procedures (SOPs) with a multi-agent system based on large models, using SOPs to encode prompts and ensure coordinated structured and modular outputs. This framework allows Agents to play diverse roles in a pipeline-like paradigm, handling complex tasks through structured Agent collaboration and enhancing solution coherence and accuracy in collaborative software engineering tasks.

In the MetaGPT demo, a multi-Agent software entity in the context of a software company is constructed, capable of handling complex tasks and mimicking different roles within the software company. The core idea is “Code equals the team’s standard operating procedure (Code = SOP(Team))”, which concretizes standard operating procedures and applies them to a team composed of large models.

AI Agent: Comprehensive Analysis of 7 Cognitive Frameworks and Code Implementation

Software company organizational role diagram

This organizational role diagram of the software company highlights the different roles and responsibilities within the company.

· Boss: Sets the overall requirements for the project.

· Product Manager: Responsible for writing and revising the Product Requirements Document (PRD).

· Architect: Writes and revises designs, reviews product requirements documents and code.

· Project Manager: Writes tasks, assigns tasks, and reviews product requirements documents, designs, and code.

· Engineer: Writes, reviews, and debugs code.

· Quality Assurance: Writes and runs tests to ensure software quality.

You just need to input a specific software development requirement. After several rounds of collaboration, the imagined software engineering team of MetaGPT can develop a truly usable simple app.

Of course, the capabilities of MetaGPT are not limited to this; it can also be used to build applications in other scenarios.

Combination of Various Cognitive Frameworks

The aforementioned cognitive frameworks can indeed be combined. For example, in the ReAct framework, Tool Calls should be configured to alter the state of the environment; further observation is needed to continue thinking.

ReAct + Tool Calls

Of course, each Agent cognitive architecture has its unique advantages. The choice of which one to use and how to combine them effectively depends on specific needs, application scenarios, and expected user experiences. Choosing the right cognitive architecture for an application is a key step in large language model application development.

Well, that’s it for today’s practical sharing. I tried to systematically analyze the current development status of Agent technology from theory to practice in a relatively concise manner, hoping to provide reference and inspiration for your Agent application development. In the future, the further development of Agent technology will profoundly impact the application of artificial intelligence in various fields, pushing human-machine collaboration to a new level.

All examples above have code implementations and detailed explanations in my course! From beginners to experts, let’s learn about large models together.

Of course, you absolutely cannot miss Jia Ge’s best-selling books:

▼Click below to purchase the book, limited-time discount50% off

AI Agent: Comprehensive Analysis of 7 Cognitive Frameworks and Code Implementation

▼Click below to purchase the book, limited-time discount50% off

References

1.https://36kr.com/p/2716201666246790 – Andrew Ng’s latest speech: The Future of AI Agent Workflows

2.https://lilianweng.github.io/posts/2023-06-23-agent/ – LLM Powered Autonomous Agents

3.Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

4.Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A., & Lewis, M. (2022).

Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint arXiv:2212.09551.

5.Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. arXiv.

6.Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629.

7.https://github.com/geekan/MetaGPT – MetaGPT: The Multi-Agent Framework

—END—

Share your views on the future development of Agent technology

Participate in the interaction in the comment area and click to view and share the event to your circle of friends. We will select one reader to receive an e-book version of the electronic book, deadline June 30.

7 Specific Implementations of Agent Cognitive Frameworks

Framework 1: Chain of Thought

Framework 5: ReAct (Reasoning, Planning, and Acting)

Framework 6: Plan-and-Execute

Framework 7: Multi-Agent Collaboration

References

Related posts

Leave a Comment Cancel reply