Deep Dive into AI Agents with ERNIE SDK and Multi-tool Orchestration

In the past year, the rapid development of large language models (LLMs) has attracted global attention. Tech giants like Baidu have launched their own large models, continuously pushing the performance limits of language models. However, the industry’s goals for LLMs are no longer limited to basic Q&A functions but are seeking to utilize large models to perform more complex and diverse tasks. This is the backdrop for the birth of the concept of Agents.

An Agent can be understood as a system that can autonomously plan decisions and integrate various tools to complete complex tasks. In this system, the large language model acts as the “core scheduler.” This scheduler is responsible for interpreting the user’s natural language input, planning a series of executable actions, and gradually completing these tasks with the help of memory modules and other components and external tools.

In 2024, the focus of the artificial intelligence industry is shifting from general large models to AI-native applications. This technological transformation is inseparable from the deep involvement of AI Agents. The core value of AI Agents lies in their ability to adapt to changing environments and demands, as well as to make effective decisions and reliable operations, signaling that we are entering the era of AGI (Artificial General Intelligence). As Bill Gates predicted: “In the next five years, everything will change drastically. You will no longer need to switch applications for different tasks; you just need to communicate with your device in everyday language, and the software will provide personalized feedback based on the information you share because it has a deeper understanding of your life.”

ERNIE SDK

Recently, ERNIE SDK has added a powerful feature—Agent development, marking a new stage in LLM development. Based on the powerful Wenxin large model and its Function Calling capabilities, it provides a new perspective for LLM application development. This framework not only addresses the core challenges faced in LLM application development but also showcases its superior performance through Wenxin large model 4.0. ERNIE SDK provides effective solutions to several key issues:

1. Token Input Limitations: Traditional large models are limited by token input limits when analyzing and summarizing large documents. ERNIE SDK offers a way to retrieve from a local knowledge base, making it easier to handle large document Q&A tasks.

2. Integration of Business API Tools: ERNIE SDK makes it possible to integrate existing business API tools, broadening the functionality and adaptability of LLM applications.

3. Data Source Connections: ERNIE SDK can query SQL databases through custom tools and connect various data sources to provide more information for the large model. As an efficient development framework, it significantly enhances developers’ work efficiency. Leveraging the rich pre-built components of the PaddlePaddle community, developers can directly utilize existing resources or customize them according to specific business needs, providing comprehensive support for the entire development lifecycle of LLM applications.

Deep Dive into AI Agents with ERNIE SDK and Multi-tool Orchestration

Analysis of Agent Architecture Based on ERNIE SDK

Agent

In some complex scenarios, we need to flexibly call the LLM and a series of required tools based on user input. Agents provide the possibility for the implementation of such applications. ERNIE SDK offers Agent development driven by the Function Calling capabilities of the Wenxin large model, allowing developers to directly use pre-built Agents and instantiate them through Chat Model, Tool, and Memory, or customize their own Agents by inheriting from the erniebot_agent.agents.Agent base class.

Chat Model (The Brain of the Agent)

The Chat Model module in ERNIE SDK is the core scheduler for decision-making, which is the knowledge-enhanced large language model developed by Baidu: the Wenxin large model.

Message (Encapsulation of Agent Input and Output Information)

Developers interact with the Chat Model through the encapsulated Message, enabling the large language model to understand the source of the input information.

This module standardizes user input and the feedback messages from the Wenxin large model to facilitate storage in the subsequent Memory module.

Memory (The Agent’s Memory)

The large language model itself does not have memory, so an important aspect of building large model applications is to give the Agent memory functionality. ERNIE SDK provides fast memory capabilities, allowing information from multiple rounds of dialogue to be stored in a List and then transferred to the context window of the Chat Model. However, this memory mode is also limited by the input tokens of the Wenxin large model. Meanwhile, ERNIE SDK also allows developers to build more complex memory modules, with reference processing methods including:

1. Vector store-backed memory; each Message from a dialogue round will be stored in a vector database after embedding processing, allowing for semantic vector approximate retrieval to find the most relevant memory fragments based on the user’s natural language input in subsequent dialogue contexts. This approach enables long-term memory, no longer limited by the context window of the Wenxin large model.

2. Conversation summary memory; this processing method summarizes the dialogue information after each round of dialogue using the Chat Model and stores the summarized brief content to relieve the pressure of storage content.

3. LangChain/LlamaIndex; enabling custom memory modules, ERNIE SDK allows developers to freely integrate frameworks like LlamaIndex to implement more complex memory modules, leveraging LlamaIndex’s excellent document retrieval capabilities for long-term memory.

Tools (The Agent’s Tools)

Allowing Agents to autonomously combine and use complex external tools to solve more complex problems is key to the widespread adoption of future AI applications; ERNIE SDK allows developers to quickly build complex applications using over 30 tools already launched by the PaddlePaddle community and also enables customization of local tools based on their business needs.

Retrieval (The Agent’s Knowledge Base)

Although general large models absorb extensive knowledge during training, they have limited understanding of domain-specific or proprietary business knowledge. The cost of fine-tuning large models with specific domain data is prohibitively high, hence the introduction of RAG (Retrieval Augmented Generation) technology, which aims to rapidly integrate external knowledge bases into large models, thus gaining a deeper understanding of specialized knowledge in specific fields. Key functions of the Retrieval module include:

Loading data sources, covering various data types:

Structured data, such as SQL and Excel

Unstructured data, such as PDF and PPT documents

Semi-structured data, such as Notion documents
Chunk transformation of data.
Vector embedding processing of data.
Storing processed data in a vector database.
Quickly locating relevant information through approximate vector retrieval. The Retrieval module of ERNIE SDK not only supports Baidu’s Wenxin Baizhong search but is also compatible with the Retrieval components of LangChain and LlamaIndex, significantly improving the efficiency and accuracy of data processing.

Quick Development Experience of Agents Based on ERNIE SDK

Now, let’s quickly understand how to develop an Agent— a manuscript review assistant. The main function of this Agent is to help us review whether the manuscripts published on various platforms comply with regulations.

Step one, log in to the PaddlePaddle community and create a new personal project. The free computing resources provided by the community are sufficient.

Step two, after logging into the PaddlePaddle community, click on your avatar to obtain your access token in the console. PaddlePaddle provides each newly registered user with a free token quota of 1 million.

Deep Dive into AI Agents with ERNIE SDK and Multi-tool Orchestration

To securely manage your sensitive token information, we recommend using Dotenv. First, install Dotenv, then save your token in a newly created .env file. Note that this file is not visible by default in the file directory; if you need to view it, you must change the settings.

Example .env file content:

Step three, verify whether your access token can be used normally:

If everything is normal, it will print out your access token. Create a new text file named manuscript.txt, which should contain the text you want to audit for compliance.

Step four, build the basic Agent (using the pre-built tools provided by the PaddlePaddle community tool center).

Run this code, and you will see the Agent using the [text-moderation/v1.2/text_moderation] tool to review the manuscript content and output the review results. Thus, the development of a simple manuscript review assistant Agent is completed. Together, we experienced the rapid development process and practicality of Agents based on ERNIE SDK.

Multi-tool Intelligent Orchestration

After deeply exploring ERNIE SDK, let’s take a look at the multi-tool intelligent orchestration feature of the PaddlePaddle community. The PaddlePaddle community not only provides a fine-grained SDK to support the detailed needs of technical developers but also introduces the multi-tool intelligent orchestration feature. This means developers can easily integrate various external tools based on the powerful Wenxin large model to create personalized AI applications. Compared to using ERNIE SDK alone, this method is faster and more convenient, greatly simplifying the development process. We will use multi-tool intelligent orchestration to reproduce the manuscript review assistant.

First, create the application using low-code development and select intelligent orchestration.

Second, click to mount the “Text Review Tool” in the sidebar tool mounting section, which is one of the more than 30 pre-built tools provided by the PaddlePaddle community tool center. You can also create your own tools.

Then, set the role identity for the manuscript assistant in the basic settings. After that, click to apply all settings, and you can experience it in the sidebar.

It is worth mentioning that the multi-tool intelligent orchestration of the PaddlePaddle community is extremely friendly to team members without a technical background. Even without in-depth programming knowledge, team members can quickly get started and easily build their own AI applications. For example, the creation of the above manuscript assistant only takes a few minutes, which not only accelerates product iteration speed but also promotes collaboration and innovation within the team.

Currently, Baidu PaddlePaddle has opened applications. Visit the PaddlePaddle community for invitation testing registration to learn more details and apply for use.

With the development of general large language models and the rise of intelligent Agent technology, we are ushering in a new era of AI application development. From the in-depth exploration of ERNIE SDK to the application of multi-tool intelligent orchestration in the PaddlePaddle community, we see how AI technology frameworks like Baidu PaddlePaddle ERNIE SDK break traditional boundaries, providing developers with unprecedented convenience and enormous developmental potential. Whether developers with a strong technical background or non-technical personnel, everyone can find their space in this new era, jointly promoting the advancement of AI technology and the popularization of AI applications. The future of AI is full of infinite potential. The vast world of AI applications awaits us to explore and create.

Related Links

ERNIE-SDK:https://github.com/PaddlePaddle/ERNIE-SDK

Multi-tool intelligent orchestration invitation testing registration:https://aistudio.baidu.com/activitydetail/1503017298

Follow 【PaddlePaddle】 WeChat Official Account

Get more technical content~

Related posts

Leave a Comment Cancel reply