Source: PaperWeekly
This article is approximately 4300 words long and is recommended for a reading time of over 10 minutes.
AutoAgent is an open-source AI assistant operating system designed to lower the technical barriers for creating AI assistants.
The University of Hong Kong recently launched the open-source project AutoAgent, developed by Professor Huang Chao’s laboratory. Its most notable feature is that it allows users to create AI assistants using natural language, making AI application development particularly simple.
In just three weeks since its open-source release, it has garnered 2.1k stars on GitHub. Based on the AutoAgent engine, Professor Huang’s team has also developed a powerful AI research assistant—Auto-Deep-Research. This general AI entity currently ranks third globally and first among open-source solutions. It is also the top performer among open-source products.
Its main features include:
- Intelligent Search: Automatically searches for information online and integrates and analyzes content.
- Automated Programming: Capable of handling various complex programming tasks.
- Data Analysis: Conducts in-depth data mining and analysis.
-
Intelligent Reporting: Generates visual reports.
1. Introduction
2025 is the Year of AI Agents
This is not a random prediction but a consensus within the tech community. From NVIDIA founder Jensen Huang to OpenAI’s Sam Altman, from DeepMind’s genius scientist Demis Hassabis to top Silicon Valley investment firm a16z, they all point to the same future: AI Agents are about to experience explosive growth.
Just as 2022 was the year of generative AI, with the emergence of ChatGPT fundamentally changing how we interact with AI, 2025 will see the widespread adoption of AI Agents, ushering in a more revolutionary transformation—one that not only understands and responds but also actively thinks, plans, interacts with the environment, and takes action, truly becoming a reliable assistant to humanity.
However, an awkward reality is that only 0.03% of people worldwide possess programming skills. This means that in the technological revolution of AI Agents, 99.97% of people may be excluded. What we truly need is not to let a few elites enjoy the dividends brought by AI Agents but to enable everyone to create and harness their own AI assistants.
At this critical historical juncture, we are launching the AutoAgent framework. This is not just another development tool but a revolutionary attempt to lower the barrier for creating AI Agents from “professional programming” to “everyday conversation.” It comes with a ready-to-use Auto-Deep-Research multi-agent system, which is a top-tier research assistant that ranks third overall and first among open-source solutions in the General AI Assistant benchmark GAIA.
Built on Claude-3.5-sonnet and supporting various models such as Deepseek and Huggingface, it is not only the most cost-effective solution among the top three but also allows everyone to easily embark on a deep research journey. Thanks to its groundbreaking self-evolving architecture and intelligent vector database, AutoAgent enables users to effortlessly create various tools and workflows through natural language, achieving true zero-code development to build your own AI assistant.
We also welcome all interested developers to join our community to explore how AutoAgent will redefine the future of human-machine collaboration on the eve of the explosive growth of AI Agents…
Self-developed framework AutoAgent:https://github.com/HKUDS/AutoAgentAuto-Deep-Research:https://github.com/HKUDS/Auto-Deep-ResearchPaper link:https://arxiv.org/abs/2502.05957
Now, let’s take a closer look at how Auto-Deep-Research is implemented!
2. A Stunning Glimpse: Let AI Be Your Financial Analyst
Help me analyze the 10-K reports of Apple and Microsoft, combining the latest market dynamics to create a quantitative analysis report, preferably with data visualization.
Through this simple command, we can see the practical application capabilities of Auto-Deep-Research. Faced with two PDF documents totaling over 200 pages, this multi-agent system demonstrated its efficient processing capabilities.
In the demonstration video, we can see three windows running simultaneously: the terminal interface (left) displays the thinking and planning process of Auto-Deep-Research; the file directory (top right) shows the generated analysis document; and the browser window (bottom right) collects the latest market information.
From document parsing, web searching, to code writing and data visualization, the entire process is completed automatically without human intervention. Within about 10 minutes, the system generated a complete analysis report—this efficiency greatly enhances the productivity of financial analysis work.
This demonstration showcases how AI can handle complex tasks, freeing humans from tedious data processing to focus on more creative work.
The report and figures generated by the agent are as follows:
3. In-Depth Analysis of AutoAgent
As shown in the figure, the design inspiration for AutoAgent comes from modern operating systems, aiming to create a fully automated AI assistant operating system. Just as Windows or MacOS provides a complete operating environment for computers, AutoAgent offers a powerful and elegant platform for AI assistants.
This platform consists of four core modules that work seamlessly together, allowing users to create and manage various AI assistants using only natural language:
- Ready-to-use open-source strongest Deep Research mode (Agentic System Utilities), providing users with top-notch complex task analysis and resolution capabilities.
- The LLM-powered Actionable Engine serves as the ‘brain’ of the entire system, responsible for understanding user needs and coordinating the collaboration of multiple AI assistants.
- The Self-Managing File System intelligently processes and organizes various multimodal data, enabling AI assistants to handle different types of information such as text and images with ease.
-
The zero-code Agent customization feature allows everyone to easily create their own AI assistants and workflows, as simple as conversing with AI.
The perfect synergy of these modules makes AutoAgent a truly versatile AI assistant platform, capable of adapting to various scenarios from academic research to business analysis.
The Strongest Open Source Auto-Deep-Research (Agentic System Utilities)
AutoAgent employs a structured multi-agent architecture, enabling it to systematically handle various complex tasks. From web browsing and information retrieval to data analysis and code execution, each functional area has dedicated intelligent agents responsible for it.
The core of this intelligent agent system is the Orchestrator Agent. It acts as the central coordinator, receiving user requests, analyzing task points, breaking them down into subtasks, and assigning them to the corresponding specialized intelligent agents. Through an efficient handoff mechanism, the intelligent agents collaborate until the entire task is completed.
Web Agent provides a comprehensive toolkit for handling web tasks. It can perform various web tasks from general web searches to file downloads, achieving precise web interactions through 10 advanced operation tools (such as click, web_search, visit_url, etc.). The system is built on BrowserGym, creating a professional browsing environment that abstracts low-level code-driven behaviors into high-level tools, significantly enhancing the extensibility of tool definitions.
Coding Agent is a comprehensive code execution solution, specifically designed to handle various code-driven tasks from data analysis and computation to machine learning, automation, and system management. It includes 11 core tools covering key functionalities such as code script creation, Python code execution, instruction implementation, and directory structure management.
The Coding Agent operates in an interactive terminal environment, with all code-related tool execution results returned via terminal output. When the output exceeds display capacity, the terminal presents it in a paginated format, allowing the agent to browse content freely using commands like terminal_page_up, terminal_page_down, and terminal_page_to, effectively addressing the context length limitations of large language models.
Local File Agent focuses on the unified processing and analysis of multimodal data. It supports the conversion and processing of various file formats, including text documents (.doc, .pdf, .txt, .ppt), video files (.mp4, .mov), audio files (.wav, .mp3), and spreadsheets (.csv, .xlsx), among others.
Through a unified toolkit, it can convert various files into Markdown format and utilize an interactive Markdown browser for efficient analysis, effectively overcoming context length limitations.
This meticulously designed architecture has demonstrated outstanding performance in the GAIA benchmark evaluation: ranking third overall and first among open-source solutions, competing with closed-source solutions from commercial giants like OpenAI.
Notably, we are the only solution among the top three based on Claude-3.5-sonnet, achieving top performance while also realizing optimal cost-effectiveness. Additionally, the system’s openness allows seamless integration with various models such as Deepseek-R1 and even supports local open-source model deployment, bringing high-performance Deep Research into the public eye.
LLM-powered Actionable Engine
The LLM-powered Actionable Engine is the core processing unit of AutoAgent, responsible for understanding natural language, generating execution plans, and coordinating tasks among various intelligent agents. The system employs LiteLLM to implement a standardized LLM calling interface, supporting over 100 models from different vendors, ensuring collaborative operation of the system.
In generating executable actions, the system has designed two complementary paradigms: the direct tool usage paradigm targets commercial language models that support tool calls, allowing for direct generation of the next executable tool; the transformation tool usage paradigm converts tool usage into structured XML code generation tasks (e.g., <function=function_name> <parameter=parameter_1>value_1 …), enhancing the performance of commercial models while providing flexibility for integrating open-source models.
Self-Managing File System
The file system of AutoAgent is essentially a vector database, specifically designed to support the retrieval and understanding of large language models. The system allows users to upload text files of any format (such as .pdf, .doc, .txt) or compressed packages and folders containing text files.
Using tools like save_raw_docs_to_vector_db, the system can automatically convert these files into a unified text format and store them in user-defined vector database collections. With tools like query_db and answer_query, intelligent agents can autonomously manage database memory, achieving efficient and accurate information retrieval and generation.
In the MultiHop-RAG benchmark test, the Agentic-RAG built on this native self-managing file system demonstrated outstanding performance: achieving an accuracy of 73.51%, significantly surpassing other baseline methods, including the well-known LangChain framework.
This achievement fully demonstrates our system’s greater flexibility and adaptability in handling complex multi-hop retrieval and generation tasks, without relying on predefined workflows, allowing for dynamic orchestration of optimal paths during retrieval.
Zero-Code Agent Customization Feature
AutoAgent has designed a code-driven self-programming intelligent agent framework that implements constraint mechanisms, error handling, and customizable workflows, enabling controllable code generation, allowing users to easily customize tools and agents or build multi-agent systems. The system supports two main modes: no-workflow agent creation and workflow-based agent creation.
No-Workflow Agent Creation
Building efficient multi-agent systems often requires specialized domain knowledge, such as financial regulations or medical protocols. To enable ordinary users to easily construct complex systems, AutoAgent provides powerful agent generation capabilities. Users only need to provide the agent’s name and a simple functional description, and the system can automatically complete the creation process.
The system first assesses existing tools and resources through a specialized analysis agent, deeply analyzing user needs. Subsequently, the tool editing agent comes into play: it can seamlessly integrate third-party APIs such as LangChain, RapidAPI, and Hugging Face, currently supporting 8 categories of 145 RapidAPI interfaces and 9 categories of Hugging Face models.
More importantly, it can automatically generate tool code, design test cases, and validate functionality, automatically debugging until successful when issues arise.
During the agent creation phase, the system automatically identifies whether multiple agents need to collaborate. If so, it will generate an orchestrator agent through the create_orchestrator_agent tool, following the Orchestrator-Workers design pattern to ensure effective coordination among multiple agents.
Workflow-Based Agent Creation
When users have specific requirements for the workflow of a multi-agent system, the system adopts an innovative event-driven approach, breaking through the strict reliance on graph theory principles in traditional graph methods for workflow generation. By modeling agent tasks as events and utilizing event listening and triggering mechanisms, it achieves more flexible agent collaboration.
The workflow construction process itself is a carefully designed multi-agent collaboration: the workflow form agent is responsible for analyzing requirements and designing event logic, generating structured XML code; a robust error detection mechanism ensures that the generated workflow strictly adheres to system constraints; finally, the workflow editing agent is responsible for creating the required new agents, building workflows, and executing tasks.
This design not only achieves true zero-code development but also ensures the system’s reliability and scalability through rigorous framework design and flexible event mechanisms.
4. Conclusion
AutoAgent is an open-source AI assistant operating system designed to lower the technical barriers for creating AI assistants. We welcome you to visit our GitHub repository, star the project, and join the open-source community. Your participation will help the project continue to improve while promoting the widespread application of AI technology, enabling more users to leverage this technology. We look forward to advancing AI assistant technology together with developers and users.
GitHub link:https://github.com/HKUDS/AutoAgent
5. Research Team
This research is brought to you by the team members of the Data Intelligence Laboratory at the University of Hong Kong, led by Professor Huang Chao.(https://sites.google.com/view/chaoh/group-join-us)The Data Intelligence Laboratory at HKU has long been engaged in data science and large language model research, with many high-star open-source projects such as LightRAG and GraphGPT. We welcome everyone to further explore on GitHub:https://github.com/HKUDS
Editor: Huang Jiyan
About Us
Data Pie THU is a data science public account backed by the Tsinghua University Big Data Research Center, sharing cutting-edge data science and big data technology innovation research dynamics, continuously disseminating data science knowledge, and striving to build a platform for gathering data talent, creating the strongest group of big data in China.
Sina Weibo: @Data Pie THU
WeChat Video Account: Data Pie THU
Today’s Headlines: Data Pie THU