Ideas are my own.
image credit: Kevin Hawkins
There is a popular saying in the industry that “2025 is the year of AI Agents”. Indeed, in the first half of 2025, we have seen a plethora of emerging “Agents”. Although there is currently no consensus on the standard definition of an Agent in the industry, such as whether process-oriented workflows or SOPs qualify as Agents, I personally believe that a true Agent must possess the following capabilities:
-
Autonomous Planning: The ability to autonomously devise a complete plan to achieve goals based on requirements.
-
Autonomous Decision-Making: The ability to make independent decisions for each action based on planning and real-time environmental changes.
-
Tool Utilization: The ability to autonomously select and invoke various tools to execute decisions and complete tasks.
-
Autonomous Reflection: Tasks in complex scenarios are difficult to follow a scripted plan perfectly; the effectiveness of an Agent requires continuous reflection during execution, perceiving real-time environments and validating results of tool usage, iterating through “perception, decision-making, execution” to explore new paths. The difficulty in managing multi-agent systems in complex scenarios is partly due to the accumulation of errors caused by weak reflection capabilities.
-
Continuous Iteration: Business processes are not static, and the capabilities of LLMs are constantly improving. The ability to continuously iterate based on the execution results, processes, and human feedback of the Agent is also crucial.
Since Sequoia proposed Service-as-a-Software last year, various traditional SaaS software have rapidly evolved towards AI Agent forms, and many new startups have adopted Agent-Native product delivery models by default. For users like me, the evaluation criteria for Agent products are purely results-oriented.
Recently, Agents seem to have become synonymous with “universal software”, but the current reality is:
Even with the most intelligent large models in the industry (such as GPT-4.1/o4, Gemini 2.5 Pro, or Claude 4), it is very challenging to achieve end-to-end process automation through a single Agent in complex tasks within vertical industry scenarios.
The fundamental reason is that the real business in vertical industry scenarios cannot be solved by a single round of Q&A or a few simple tasks; it is a complex system composed of multiple roles, stages, tools, and processes.
From the perspective of the practicality of large model applications in vertical industry scenarios, Multi-Agent Systems (MAS) are an inevitable engineering practice for deploying large models in complex industry scenarios. The application of large models also needs to shift from “how to make a model smarter” to “how to build a reliable, efficient, and manageable intelligent system”.
Designing a production-level Multi-Agent System (MAS) is a very complex engineering problem. This article attempts to explore how to create a simple entry-level multi-agent system, demonstrating that even a small sparrow can be complete in its organs.
The Core Value of MAS: From “Single Point” to “Full Process”
In knowledge-intensive, complex process vertical industries such as law, healthcare, finance, and high-end manufacturing, a single Agent system often struggles to cope. The advantages of MAS become apparent when dealing with complex business processes that involve multiple steps and interactions with numerous legacy systems.
Let’s look at some typical vertical industry application scenarios:
-
Legal Services: Automating large-scale contract reviews, legal research, evidence collection and organization, compliance checks, litigation support, etc. MAS can simulate the work of a legal team, with different Agents responsible for document pre-screening, key clause extraction, risk identification, case retrieval, and drafting legal opinions.
-
Healthcare: Assisting in clinical diagnosis and treatment plan formulation, personalized drug development, clinical trial management, medical image analysis, and intelligent retrieval and synthesis of medical literature.
-
Financial Services: Automating credit assessments and loan approvals, fraud detection and risk management, personalized investment advisory services, high-frequency trading strategy execution, and automatic generation of financial regulatory reports.
-
Customer Support: Collaborating through multiple Agents to handle complex customer inquiries from various channels, achieving intelligent routing, problem-solving, executing operations, and seamlessly transferring to human agents when necessary.
-
Enterprise Knowledge Management and R&D: Helping employees conduct in-depth research, information discovery, and insight extraction within a vast corporate knowledge base, supporting complex knowledge work such as product development and market analysis.
From these application scenarios, the core value of MAS is primarily reflected in several aspects:
-
Task Decomposition and Specialized Processing: Complex vertical industry business processes can be decomposed into a series of smaller, more manageable sub-tasks. MAS can assign specialized Agents with specific knowledge, skills, and tool access rights to each sub-task. For example, in the legal tech field, Harvey AI utilizes specialized Agents to handle different tasks such as contract analysis and legal research. This specialization can improve the quality and efficiency of each step, avoiding the burden of a “universal” Agent taking on tasks it is not suited for.
-
Reasoning and Decision-Making Capabilities: Through collaboration among multiple Agents, deeper reasoning can be achieved. For instance, one Agent extracts preliminary evidence from a vast array of documents, another Agent conducts logical analysis and cross-validation of this evidence, and yet another Agent generates decision recommendations based on the analysis results. This multi-Agent collaboration or pipeline reasoning is more likely to yield robust and comprehensive conclusions than a single Agent working in isolation.
-
Parallel Processing and Efficiency Improvement: Many complex business processes contain steps that can be processed in parallel. MAS naturally supports distributing these steps to different Agents for simultaneous execution, significantly reducing the overall task completion time.
-
Flexible Integration with Existing Systems: Vertical industries often rely on numerous specialized tools, databases, and legacy systems. Each Agent in MAS can be designed to interact specifically with certain tools or APIs. For example, the Salesforce Agentforce can call the MuleSoft API for integration with external systems, while the Google Vertex AI Agent can connect to various databases and public APIs. This modular approach to tool integration is more efficient and maintainable than requiring a single Agent to master all interfaces.
-
Scalability and Robustness: If atomic granular single-task Agents and multi-task combinations can be accurately identified in business scenarios, the MAS architecture becomes easier to scale. When business demands increase or new task types emerge, new specialized Agents can be added or existing Agents’ capabilities can be expanded without the need for large-scale restructuring of the entire system.
Key Challenges in Designing Efficient MAS
Designing a production-level MAS presents significant engineering challenges, and even entry-level MAS must consider the following key challenges:
1. Orchestration: Complexity of State Management and Process Control
Transitioning from simple serial linear processes to complex dependency graphs (DAGs) that include parallel, branching, looping, and retrying significantly increases the complexity of the entire system.
How can the execution state of the entire graph be reliably managed? If a sub-task fails, should it be retried, abandoned, or trigger a new branch? Who maintains this “state machine”? If the system crashes, how can it recover from the last checkpoint?
In terms of collaboration, every communication and handoff between Agents incurs overhead for data serialization, network transmission, and context loading. As the number of Agents increases, this collaborative overhead can quickly negate the benefits of parallelism. The key to design lies in finding the optimal balance between “task decomposition granularity” and “Agent collaboration overhead”.
2. Context: Balancing Long Contexts and Accuracy
Context is the “fuel” for LLMs, but in MAS, longer and more complex contexts can lead to information interference and model accuracy becoming bottlenecks.
When multiple Agents operate in parallel sharing a knowledge base or external systems, how can data consistency be ensured? If one Agent is updating customer information while another Agent makes decisions based on outdated information, this is unacceptable in fields like finance and healthcare.
3. Execution Challenges: Interface Issues with the Non-Ideal World
Agents interact with the external world through tool invocation, but existing APIs and systems can encounter a variety of issues.
How to deal with the “fragility” of APIs? How to handle the ambiguity in parsing results from various systems and APIs? This is one of the most unstable aspects of the Agent execution layer.
Key Technical Solutions for Building Vertical Industry MAS

Based on the aforementioned challenges, let’s explore a relatively pragmatic entry-level technical architecture and implementation plan.
1. Laying the Foundation: Overall Architecture and Core Roles of MAS
A fully functional MAS should have a modular and role-based architecture. Separating different responsibilities into different Agents or services is key to ensuring system maintainability and scalability.

Table 3.1: Core Agent Roles in MAS and Their Key Responsibilities
2. Intelligent Navigation: Planning, Decision-Making, and Dynamic Adjustment Mechanisms
Planning, reasoning, and dynamic adjustment are the core responsibilities of the coordinator and reflect the intelligence of MAS.
-
Planning Representation: A common representation is a DAG, which clearly expresses complex dependencies, parallel paths, and conditional branches between tasks. Other planning representations include simple structured task chains, natural language processes based on LLM understanding, behavior trees, and even some pseudo-representative scripting forms.
-
Reasoning and Decision-Making: A core principle is that reasoning strategies are primarily applied to the decision-making process of the coordinator/planner Agent, or executed by specialized sub-Agents within a strictly controlled scope. Common reasoning strategies include backtracking search, Branch and Bound/Beam Search, Sampling, and the highly debated MCTS, aiming to optimize the collaborative planning and execution efficiency of the entire MAS. Some good design principles include “centralized decision-making, decentralized execution”, “clear evaluation criteria”, “limiting exploration scope”, and “dynamic feedback injection”.
-
Dynamic Adjustment Implementation: This requires a “perception-planning-action-evaluation-replanning” closed loop, including execution monitoring, deviation detection, and replanning, such as local repairs, plan adjustments, or even human intervention.

Table 3.2: Comparison of Different Planning Representation Methods in MAS
3. Information Transmission: Communication and Context Sharing Among Agents
Achieving smooth information flow between Agents directly affects overall performance in terms of efficiency and reliability. If used in a production environment, I personally strongly recommend implementing a standalone Context Management module.

Table 3.3: Key Context Information in MAS
4. Reliable Execution: Tool Invocation, Monitoring, and Human-Machine Collaboration
This is the last mile of realizing the value of MAS; no matter how good the decision is, if it cannot be converted into final results through action, it is all in vain. Since enterprises may have various systems, reliable internal and external tool invocation is not an easy task. Key considerations in design may include:
-
Safe Tool Invocation:
-
Standardization of Tool Descriptions: Use OpenAPI specifications to define API tools, providing clear function signatures and documentation strings to describe other tools. This is a prerequisite for reliable Agent invocation.
-
Security Considerations: Strict authentication and authorization must be implemented at the API gateway or tool invocation layer. For high-risk operations (such as transactions, data deletions), a human approval process must be introduced.
-
Refined Error Handling:
-
Local Error Handling: Should have capabilities for automatic retries (e.g., exponential backoff), graceful degradation, etc.
-
Global Error Handling: When local handling fails, detailed error information must be reported to the coordinator, which will decide the next course of action, such as replanning or requesting human intervention.
-
Effective Human-Machine Collaboration (Human-in-the-loop, HITL):
-
Setting “Breakpoints” at Key Decision Points: The system should pause at these points, clearly presenting the context and suggestions of the decision to humans, waiting for confirmation or correction.
-
Providing Explainable “Thought Processes”: Agents should not only provide results but also the basis and reasoning chain for arriving at those results. This transparency not only builds trust but also allows for quick judgment on how to assist in resolving issues when problems arise.
-
Feedback Mechanisms Must Be Closed Loop: Every human intervention and correction should be collected as high-quality labeled data for future model fine-tuning and optimization.
5. Continuous Evolution: Feedback Loops and Learning Mechanisms
A MAS that cannot learn from experience is lifeless; both internal and external environments and demands are constantly changing, and a continuously iterating and evolving MAS is a “living” system. Key points to consider in design and implementation include:
-
Multi-Dimensional Evaluation: A multi-faceted evaluation system that includes automated metrics and human assessments needs to be established.
-
Automated Evaluation: Monitoring quantitative metrics such as task success rates, end-to-end latency, and tool invocation error rates. The use of “LLM-as-a-Judge” can evaluate the quality of output content.
-
Human Evaluation: For complex and subjective tasks, human expert evaluation remains the gold standard.
-
Feedback-Based Learning:
-
Planning Strategy Optimization: By analyzing historical task data, bottlenecks in planning patterns can be identified, leading to optimization of MAS planning strategies.
-
Agent Model Fine-Tuning: High-quality human-machine interaction data collected (especially human corrections) is valuable domain advantage data for model fine-tuning or reinforcement learning (RLVR).
-
Dynamic Knowledge Base Updates: An online process must be established to add new knowledge generated during tasks back into the domain knowledge after verification, and regularly clean outdated or invalid knowledge to ensure the “freshness” of the knowledge base.
Key Takeaways
Finally, I would like to share four simple insights from my practice.
-
The core of MAS is “Systems Engineering”, not “Algorithm Engineering”. This is a typical complex distributed system design problem, where the engineering challenges lie more in designing the overall system topology, data flow, state management, fault tolerance mechanisms, authentication mechanisms, context management mechanisms, and Agent communication mechanisms, etc.
-
The essence of Agent orchestration is a “persistent state machine”. The soul of the main planning/orchestration Agent is not the LLM used for “reflection”, but the reliable “state machine” responsible for tracking, managing, and recovering the entire task process. The LLM provides intelligence, but the state machine provides the skeleton.
-
The highest value of human-machine collaboration is as a “high-quality data generator”. Every effective human intervention generates valuable, high-quality labeled data. This data is the “fuel” for future model fine-tuning and achieving system autonomous evolution. Therefore, human-machine collaboration is not just a cost; over time, it becomes an invisible moat.
-
Non-functional requirements should not be overlooked. In vertical industries, whether a MAS can go live and create value is often not determined by its level of “intelligence”, but by its “non-functional” metrics: Does its end-to-end latency meet business requirements? Can its security and compliance pass audits? Is its observability sufficient to support rapid fault diagnosis? Are its operational costs within acceptable limits? These seemingly “mundane” engineering issues are the real “lifelines”.
Multi-Agent Systems (MAS) are an inevitable choice for deploying large model applications in vertical industry scenarios. There are no shortcuts on this path; it requires rigorous systems engineering thinking to address the series of business and technical challenges mentioned above. MAS is also the foundation of “silicon-based employees” in future intelligent organizations, serving not only as the main force for business value delivery but also as a driving force for future organizational transformation.