Comprehensive Survey on the Development and Application of AI Agents in 2023: Concepts, Principles, Development, Applications, Challenges, and Prospects

The field of Artificial Intelligence (AI) is rapidly evolving.Today’s AI agents are capable of perceiving, making decisions, and taking actions independently. With the rise of AI agents driven by large language models (LLMs), we are on the brink of a new era: AI agents may form their own societies and coexist harmoniously with humans.

Newton once said, “If I have seen further, it is by standing on the shoulders of giants.” Today, these giants are AI agents, helping to bear the heavy workload.

Image source: Wang Knowledge

Author: Zhang Changwang, Wang Knowledge

In today’s article, we will introduce some of the best open-source AI agents and multi-agent frameworks that can be used in personal and enterprise settings, and discuss the following topics:

How AI agents create opportunities for innovation and efficiency.
Which multi-agent frameworks offer the best functionalities.
When it is best to implement AI agents to solve real-world practical problems.
What impact autonomous agents will have on AI-driven task management.

We will also delve into some opportunities, challenges, and trends in agent architecture.

1 Introduction to AI Agents

Tools like ChatGPT, DALL-E 3, or Midjourney use prompt-based interfaces for human-computer interaction. This means you need to write a set of instructions in natural language (often followed by numerous attempts at repeated prompting) to get meaningful responses. Given the capabilities of AI models, their speed is slow and counterintuitive. We need better and more efficient ways to interact with AI.

1.1 Role of AI Agents

AI agents act as overseers of AI. They work in a self-directed cyclical manner, setting tasks for the AI, determining priorities, and re-prioritizing tasks until the overall goal is achieved.

Image source: Wang Knowledge

1.2 Principles of AI Agents

Image source: https://arxiv.org/pdf/2309.07864.pdf

The overall framework of AI agents consists of three key components: brain, perception, and action:

Brain: The brain mainly consists of a large language model that not only stores knowledge and memory but also performs functions such as information processing and decision-making, and can present reasoning and planning processes, effectively handling unknown tasks.
Perception: The core purpose of the perception module is to extend the agent’s perceptual space from the pure text domain to a multimodal domain that includes text, auditory, and visual patterns.
Action: In the construction of the agent, the action module receives action sequences sent from the brain module and executes actions that interact with the environment.

After perceiving the environment, the human brain integrates, analyzes, and reasons about the perceived information to make decisions. Subsequently, they use the nervous system to control their bodies, taking adaptive or creative actions such as conversing, avoiding obstacles, or starting a fire. When an agent possesses a brain-like structure, with knowledge, memory, reasoning, planning, and generalization capabilities, as well as multimodal perception abilities, it is also expected to have various actions similar to humans to respond to the surrounding environment. In the construction of the agent, the action module receives action sequences sent from the brain module and executes actions that interact with the environment.

1.3 Advantages of AI Agents

AI agents driven by large language models have the following advantages:

Language Interaction: Their inherent ability to understand and generate language ensures seamless user interaction.
Decision-Making Ability: Large language models have the capacity for reasoning and decision-making, making them adept at solving complex problems.
Flexible Adaptation: The adaptability of agents ensures they can be molded for different applications.
Collaborative Interaction: Agents can collaborate with humans or other agents, paving the way for multifaceted interactions.

1.4 Applications of AI Agents

Image source: https://arxiv.org/pdf/2309.07864.pdf

AI agents have a wide and diverse range of use cases. These agents, powered by large language models (LLMs), can be used in various scenarios, including:

Single-Agent Applications: Agents can serve as personal assistants, helping users offload daily tasks and repetitive labor. They can independently analyze, plan, and solve problems, alleviating personal workload and enhancing task resolution efficiency.
Multi-Agent Systems: Agents can interact with each other in collaborative or competitive ways. This allows them to make progress through teamwork or adversarial interactions. In these systems, agents can jointly complete complex tasks or compete with each other to improve their performance.
Human-Machine Collaboration: Agents can interact with humans, assisting them in executing tasks more efficiently and safely. They can understand human intentions and adjust their behavior to provide better service. Human feedback can also help agents improve their performance.
Specialized Domains: Agents can be trained and specialized for specific fields, such as software development, scientific research, or other industry-specific tasks. They can leverage pre-training on large-scale corpora and generalize to new tasks, providing expertise and support in these areas.

These are just a few examples of AI agents. The versatility and functionality of these agents make them suitable for a wide range of applications and industries.

1.5 Agent Society

Agent Society is a concept where AI agents created using language models interact with each other in a simulated environment. These agents can act, make decisions, and engage in social activities like humans.

Image source: https://arxiv.org/pdf/2309.07864.pdf

It helps us understand how AI agents can work and behave collaboratively in a society-like environment. This simulation can provide insights into collaboration, policy-making, and ethical considerations. Overall, Agent Society helps us explore the social aspects of AI agents and their interactions in real and controlled environments.

2 Best Development Frameworks for AI Agents

There are many frameworks available to help create AI agents. Here are some of the best frameworks:

2.1 🦜️🔗 LangChain

Framework URL: https://github.com/langchain-ai/langchain

LangChain is a framework for developing applications powered by language models. It enables applications to:

Perceive Context: Connect language models to context sources (prompt instructions, small sample examples, content of responses, etc.)
Reason: Rely on language models for reasoning (about how to respond based on provided context, what actions to take, etc.)

Image source: https://github.com/langchain-ai/langchain

The LangChain framework has several core components:

LangChain Library: Python and JavaScript libraries. Contains interfaces and integrations for numerous components, a basic runtime for combining these components into chains and agents, and ready-made implementations of chains and agents.
LangChain Templates: A series of easily deployable reference architectures for various tasks.
LangServe: A library for deploying LangChain chains as REST APIs.
LangSmith: A developer platform that allows you to debug, test, evaluate, and monitor chains built on any LLM framework, seamlessly integrating with LangChain.

2.2 AutoGen

Framework URL: https://github.com/microsoft/autogen

AutoGen is a framework that supports the development of LLM applications using multiple agents that can converse with each other to solve tasks. The agents in AutoGen are customizable, conversational, and seamlessly allow human involvement. AutoGen applications can operate in various modes combining large language models, human input, and the use of tools.

Image source: https://github.com/microsoft/autogen

AutoGen makes it easy to build next-generation LLM applications based on multi-agent dialogue. It simplifies the orchestration, automation, and optimization of complex LLM workflows. It maximizes the performance of LLM models and overcomes their weaknesses.
It supports various dialogue modes for complex workflows. With customizable and conversational agents, developers can use AutoGen to build various dialogue modes involving dialogue autonomy, the number of agents, and agent dialogue topology.
It provides a range of working systems with varying complexities. These systems cover a wide range of applications of different fields and complexities. This demonstrates how AutoGen can easily support different dialogue modes.
AutoGen offers enhanced LLM reasoning. It provides utilities such as API unification and caching, as well as advanced usage modes like error handling, multi-configuration reasoning, and context programming.

2.3 💡 PromptAppGPT

Framework URL: https://github.com/mleoking/PromptAppGPT

PromptAppGPT is the first LLM-based natural language application development framework: supporting fully automated compilation, execution, interface generation, supporting no-code configuration for process scheduling, and supporting dozens of lines of low-code to achieve AutoGPT-like fully autonomous agents.PromptAppGPT significantly lowers the barrier to agent development: No software download is required; you can develop by simply opening the URL (https://promptappgpt.wangzhishi.net/).

Comprehensive Survey on the Development and Application of AI Agents in 2023: Concepts, Principles, Development, Applications, Challenges, and Prospects Image source: https://github.com/mleoking/PromptAppGPT

PromptAppGPT also includes the following built-in agent examples:

All Executors: An application using all executors.
My ChatGPT: A chatbot application.
Imaginative Image Creator: An agent that creates imaginative images from any language using GPT and DALL·E.
Pizza Order Bot: An automated agent for collecting pizza restaurant orders.
Universal Translator: An agent that translates text from any language into English/Chinese/French/Spanish.
English Improver: An agent for English translation and refinement.
Web & Image Searcher: An agent that searches for web pages and images using Bing.
My AutoGPT: An agent similar to AutoGPT that can operate fully autonomously using GPT and executors (plugins) to achieve any goal.

3 Best Application Projects for AI Agents

3.1 AutoGPT

Project URL: https://github.com/Significant-Gravitas/AutoGPT

AutoGPT was developed by Toran Bruce Richards, the founder of the video game company SignificantGravitasLtd., and is one of the early agents launched in March 2023. It is also the most popular agent project on GitHub today.

How AutoGPT works. Image source: lesswrong.com

The idea behind AutoGPT is simple—it is a complete toolkit for building and running custom AI agents for various projects. The tool uses OpenAI’s GPT-4 and GPT-3.5 large language models (LLMs), allowing agents to be built for various personal and commercial projects.

3.2 BabyAGI

Project URL: https://github.com/yoheinakajima/babyagi

BabyAGI is a streamlined version of task-driven autonomous agents. The Python script consists of only 140 lines of code, and according to the official GitHub repository, it “uses OpenAI and vector databases (such as Chroma or Weaviate) to create, prioritize, and execute tasks.”

Since its launch, BabyAGI has expanded into several interesting projects. Some, like twitter-agent🐣 or BabyAGI on Slack, bring the power of agents to existing platforms. Others add plugins and additional features or port BabyAGI to other languages (e.g., Babyagi-perl).

BabyAGI Agent loop. Image source: github.com/yoheinakajima/babyagi

3.3 SuperAGI

Project URL: https://github.com/TransformerOptimus/SuperAGI

SuperAGI is a more flexible and user-friendly alternative to AutoGPT. Think of it as an integrator of open-source AI agents, containing everything needed to build, maintain, and run your own agents. It also includes plugins and a cloud version for testing.

The framework features multiple AI models, a graphical user interface, integration with vector databases (for data storage/retrieval), and performance insights. There is also a marketplace with toolkits that allow you to connect to popular applications and services like Google Analytics.

SuperAGI includes the following features:

Configure, generate, and deploy autonomous AI agents – Create production-ready and scalable autonomous agents.
Extend agent capabilities using toolkits – Add toolkits from our marketplace to the agent’s workflow.
Graphical User Interface – Access agents through a graphical user interface.
Operational Console – Interact with agents by providing input and permissions.
Multiple Vector Databases – Connect to multiple vector databases to enhance agent performance.
Performance Telemetry – Gain insights into agent performance and optimize accordingly.
Optimize Token Usage – Control token usage to manage costs effectively.
Agent Memory Storage – Enable agents to learn and adapt by storing memories.
Models – Custom fine-tuned models for specific business use cases.
Workflows – Easily automate tasks using predefined steps of ReActLLM.

3.4 🚀🎬 ShortGPT

Project URL: https://github.com/RayVentura/ShortGPT

ShortGPT is a powerful framework for automating content creation. It simplifies video creation, sourcing materials, voice synthesis, and editing tasks.

ShortGPT can handle most typical video-related tasks, such as writing video scripts, generating voiceovers, selecting background music, writing titles and descriptions, and even editing videos. The tool is suitable for cross-platform short video content and long video content-related tasks.

ShortGPT loop and features. Image source: github.com/RayVentura/ShortGPT

The ShortGPT framework includes the following main features:

🎞️ Automated Editing Framework: Simplifies the video creation process using LLM-oriented video editing language.
📃 Scripts and Prompts: Provides ready-to-use scripts and prompts for various LLM automated editing workflows.
🗣️ Voiceover/Content Creation: Supports multiple languages, including English🇺🇸, Spanish🇪🇸, Arabic🇦🇪, French🇫🇷, Polish🇵🇱, German🇩🇪, Italian🇮🇹, Portuguese🇵🇹, Russian🇷🇺, Mandarin🇨🇳, Japanese🇯🇵, Hindi🇮🇳, Korean🇰🇷, and over 30 other languages (using EdgeTTS).
🔗 Subtitle Generation: Automatically generates video subtitles.
🌐🎥 Resource Sourcing: Sources images and video clips from the internet, connecting to the web and Pexels API as needed.
🧠 Memory and Persistence: Ensures long-term persistence of automatically edited variables using TinyDB.

3.5 ChatDev

Project URL: https://github.com/OpenBMB/ChatDev

ChatDev is a virtual software company that operates through various agents taking on different roles, including CEO, Chief Product Officer, Chief Technology Officer, programmers, reviewers, testers, and art designers. These agents form a multi-agent organizational structure and unite under the mission of “transforming the digital world through programming.”

Agents in ChatDev collaborate by participating in specialized functional workshops, including design, coding, testing, and documentation tasks.

The main goal of ChatDev is to provide an easy-to-use, highly customizable, and scalable framework based on large language models (LLMs), making it an ideal scenario for researching collective intelligence.

Image source: https://github.com/OpenBMB/ChatDev

CoPilot, Bard, ChatGPT, and many other tools are powerful coding assistants. However, projects like ChatDev may soon compete with them. ChatDev is referred to as “a virtual software company” that uses not one but multiple agents, each playing different roles in a traditional development organization. Each agent is assigned a unique role and can collaborate on various tasks, from designing software to writing code and documentation.

3.6 MetaGPT

Project URL: https://github.com/geekan/MetaGPT

MetaGPT is another open-source AI agent framework that attempts to mimic the structure of traditional software companies. Similar to ChatDev, agents are assigned roles of product managers, project managers, and engineers, and they collaborate on user-defined coding tasks.

Illustration of a multi-role software company. Image source: https://github.com/geekan/MetaGPT

So far, MetaGPT can only tackle moderately challenging tasks—such as writing a snake game or building simple utility applications—but it is a promising tool that may rapidly evolve in the future. Generating a complete project will cost about $2 in OpenAI API call fees.

3.7 Camel

Project URL: https://github.com/camel-ai/camel

In short, Camel is one of the early multi-agent frameworks that uses a unique role-playing design to enable multiple agents to communicate and collaborate with each other.

Dialogue between two ChatGPT agents in the CAMEL framework. Image source: https://github.com/camel-ai/camel

It all starts with a task defined by humans. The framework leverages the power of LLMs to dynamically assign roles to agents, specify and develop complex tasks, and arrange role-playing scenarios for collaboration between agents.

3.8 JARVIS

Project URL: https://github.com/microsoft/JARVIS

JARVIS handles task planning, model selection, task execution, and content generation. By accessing dozens of dedicated models in the Hugging Face hub, JARVIS utilizes the reasoning capabilities of ChatGPT to apply the best model to a given task. This makes it quite flexible for various tasks (from simple summarization to object detection).

Overall framework of JARVIS. Image source: https://github.com/microsoft/JARVIS

JARVIS introduces a collaborative system consisting of a large language model acting as a controller and numerous expert models (from the Hugging Face Hub) acting as collaborative executors. The workflow of the system consists of four stages:

Task Planning: Using ChatGPT to analyze user requests, understand their intentions, and break them down into potentially solvable tasks.
Model Selection: To solve the planned tasks, ChatGPT selects expert models hosted on Hugging Face based on descriptions.
Task Execution: Calls and executes each selected model and returns the results to ChatGPT.
Response Generation: Finally, ChatGPT integrates the predictions from all models and generates a response.

Example of JARVIS workflow. Image source: https://github.com/microsoft/JARVIS

3.9 OpenAGI

Project URL: https://github.com/agiresearch/OpenAGI

OpenAGI is an open-source AGI (Artificial General Intelligence) research platform that combines small expert models (tailored for tasks like sentiment analysis or image deblurring) with task feedback reinforcement learning (RLTF) to enhance its output. It brings together popular platforms like ChatGPT, large language models like LLaMa2, and other specialized models, dynamically selecting the right tools based on task context.

OpenAGI framework. Image source: https://github.com/agiresearch/OpenAGI

OpenAGI is an open-source AGI research platform designed to provide complex multi-step tasks, accompanied by task-specific datasets, evaluation metrics, and various scalable models. OpenAGI formulates complex tasks as natural language queries, serving as input for LLMs. The LLM then selects, synthesizes, and executes the models provided by OpenAGI to solve the tasks. Additionally, the project proposes a task feedback reinforcement learning (RLTF) mechanism that uses task resolution results as feedback to improve the LLM’s task-solving capabilities. Thus, the LLM is responsible for synthesizing various external models to solve complex tasks, while RLTF provides feedback to enhance its task-solving abilities, creating a feedback loop for self-improving AI. The LLM operating various expert models to solve complex tasks is a promising approach to achieving AGI.

Comprehensive Survey on the Development and Application of AI Agents in 2023: Concepts, Principles, Development, Applications, Challenges, and Prospects Guiding OpenAGI to create a traditional Chinese painting themed “High Mountains and Flowing Water,” accompanied by a generated ancient Chinese poem and music consistent with the painting. OpenAGI first searches online for the ancient story of “Understanding” and “High Mountains and Flowing Water,” then gradually generates the painting, poem, and music based on the collaboration of large language models and domain expert models. The created painting, poem, and music all align with the ancient story. Image source: https://github.com/agiresearch/OpenAGI

Comprehensive Survey on the Development and Application of AI Agents in 2023: Concepts, Principles, Development, Applications, Challenges, and Prospects Instructing OpenAGI to create a travel report for a journey in China, including recommendations for attractions, activities, and local cuisine, as well as practical information for travelers, such as how to stay safe and healthy and how to travel in the country. Image source: https://github.com/agiresearch/OpenAGI

2.10 XAgent

Framework URL: https://github.com/OpenBMB/XAgent

XAgent is an open-source experimental large language model (LLM) driven autonomous agent that can automatically solve various tasks. It is designed as a general agent that can be applied to a wide range of tasks. XAgent is still in its early stages, and developers are working to improve it.🏆 XAgent aims to create a super-intelligent agent capable of solving any given task.

XAgent components. Image source: https://github.com/OpenBMB/XAgent

XAgent is designed with the following features:

Autonomy: XAgent can automatically solve various tasks without human intervention.
Safety: XAgent is designed to operate safely. Regardless of how it runs, all operations are confined within a docker container!
Scalability: XAgent is designed to be scalable, allowing for easy addition of new tools to enhance agent capabilities or even new features!
GUI: XAgent provides a user-friendly GUI for interaction with the agent. It can also be interacted with using a command-line interface.
Collaboration with Humans: XAgent can collaborate with humans to handle tasks. It can follow human guidance to solve complex tasks during the journey and can also seek human assistance when faced with challenges.

XAgent consists of three parts:

🤖 Dispatcher: Responsible for dynamically instantiating tasks and assigning them to different agents. It allows for the addition of new agents and enhances agent capabilities.
🧐 Planner: Responsible for generating and refining task plans. It breaks tasks down into subtasks and generates milestones for them, allowing agents to solve tasks step by step.
🦾 Actor: Responsible for executing actions to achieve goals and complete subtasks. The actor utilizes various tools to solve subtasks and can also collaborate with humans to solve tasks.

XAgent workflow. Image source: https://github.com/OpenBMB/XAgent

4 Role and Challenges of AI Agents

4.1 Role of AI Agents

“So, what can I do with agents?” This is a great question, and we would love to say “everything,” but considering the current state of technology, that is far from the truth. Nevertheless, even in their initial stages, AI agents can make life and work easier in the following ways:

🔎 Simplifying research and data collection.
✏️ Generating content in various styles and tones.
🌐 Crawling the web and extracting key insights.
💭 Summarizing documents and spreadsheets.
🔀 Translating content between languages.
🤝 Acting as virtual assistants for creative tasks.
⚡️ Automating management tasks, such as scheduling and tracking.

Agents will continue to evolve from prompt-based tools requiring human interaction to fully autonomous systems operating in self-guided loops. After all, this is what AI tools should be—automated, trustworthy, and reliable, without the need for lengthy prompts or reviewing every step.

Suppose you want to analyze market trends in the electric vehicle (EV) industry over the past decade. You could delegate these tasks to agents while doing other things, rather than manually collecting data, reading countless articles, and parsing financial reports.

Even when using tools like ChatGPT, humans need to stay updated. Agents can help find the right information, take notes, and organize everything. If there is already some data, agents can provide substantial key insights in seconds.

Sometimes, a project may be too complex for a single agent to manage. Through a multi-agent setup, each agent can be responsible for handling a part of the project. One agent can collect data, while another can create a report outline. Then, a third agent can compile the information and generate the actual content.

4.2 Challenges of AI Agents

Fully autonomous agents are still in the wild west of AI tools; they are largely experimental and require a certain level of technical knowledge to set up, deploy, and maintain. This is great for DIY projects, but if you just want to get the job done, it is not a plug-and-play experience. Technically, open-source agents can be integrated with existing workflows. But this requires time, expertise, and resources.

Of course, there is also the issue of hallucinations.Since agents rely on large language models to generate information, they are equally prone to falling into bizarre narratives without factual basis. The longer agents run, the more likely they are to fabricate and distort reality. From a productivity standpoint, this creates some dilemmas. There are some simple solutions, including: limiting the runtime of agents, narrowing the scope of tasks, involving a human in the loop to review outputs, etc.

By deploying multiple agents with expertise and unique skills, better results can be achieved—thus, multi-agent frameworks may become more popular.

5 Conclusion and Outlook

The development of AI, with faster, more accurate, and larger-scale iterations of AI models like GPT-4, Bard, and LlaMa2 on the horizon, we may see more exciting breakthroughs in the coming months. Especially the rise of AI agents marks a significant shift in the digital realm. These agents possess the ability to understand, create, and interact; they are not just tools but potential collaborators across various fields. As we stand on the brink of this revolution, we must responsibly harness their capabilities.

The tools and platforms available today enable us to customize agents for different tasks, but we must also remain vigilant and consider the ethical implications of these advancements. The bridge between humans and AI has never been shorter, and as we move forward, harmonious coexistence seems not only possible but imminent.

In the foreseeable future, agents will redefine our views on work, planning, and collaboration. They will revolutionize productivity and enhance traditional workflows. So, are you ready to join this revolution?

References:

Taskade, Top 11 Open-Source Autonomous Agents & Frameworks: The Future of Self-Running AI, https://www.taskade.com/blog/top-autonomous-agents.
BILAL MANSOURI, What Are LLM Agents? An Overview of Their Capabilities, https://gptpluginz.com/llm-agents/.
Zhiheng Xi et. al., The Rise and Potential of Large Language Model Based Agents: A Survey, https://arxiv.org/pdf/2309.07864.pdf.

Related posts

Leave a Comment Cancel reply