Chapter 6: Multi-Agent Systems

1. Why Do We Need Multiple Agents? Division of Labor, Fault Tolerance, and Coverage of Specialized Fields

A single agent can accomplish a “task,” but when faced with complex projects, multi-role collaboration, and high fault tolerance requirements, it becomes inadequate.

Multi-Agent Systems (MAS) provide core value by enabling multiple specialized agents to collaborate, working in parallel with clear division of labor, mutual verification, and dynamic adjustment, much like a company.

๐Ÿง  Academic Background Supplement

The concept of multi-agent systems is not new; its theoretical roots can be traced back to:

  • Distributed Artificial Intelligence (DAI): In the 1980s, research focused on how multiple AI entities collaborate to solve problems.
  • Game Theory: Nash equilibrium and incentive mechanism design for competition and cooperation among agents.
  • Organizational Modeling: Mimicking the hierarchical structure and role division of enterprises/military.
  • Swarm Intelligence: Inspired by ant colonies and bird flocks, self-organized collaboration without central control.

๐ŸŽฏ Limitations of a Single Agent

Issue Example Multi-Agent Solution
Single Capability An agent needs to write reports and create graphics, compromising quality. Split into “Researcher + Writer + Designer” agents.
No Fault Tolerance If the report-writing agent makes an error, the entire process fails. Introduce a “Reviewer Agent” to verify results.
Inefficiency Sequential execution: data retrieval โ†’ report writing โ†’ PPT creation. Parallel execution: all three tasks occur simultaneously.
Lack of Specialized Depth A generalist agent struggles to master law, finance, and design. Each agent focuses on one domain.

๐Ÿ”„ Core Advantages of Multi-Agent Systems

  • โœ… Specialized Division of Labor: Each agent only does what it excels at.
  • โœ… Parallel Efficiency: Multiple tasks executed simultaneously, reducing total time.
  • โœ… Enhanced Fault Tolerance: If one agent fails, others can take over or retry.
  • โœ… Scalability: New functionalities can be added by introducing new agents without affecting the existing system.
  • โœ… Competitive Optimization: Multiple agents provide different solutions, allowing for optimal selection (e.g., bidding, voting).

๐Ÿ’ก Summary in One Sentence:A single agent is like a “full-stack engineer,” while multiple agents resemble a “product company” โ€” project manager + development + testing + design, each performing their role.

2. Multi-Agent Architecture Patterns

A multi-agent system is not just “a bunch of agents running around”; it has a clear collaborative architecture. There are five mainstream patterns:

๐Ÿ–ผ๏ธ [Suggested Image Location] Overview of Multi-Agent Collaboration Patterns

[Role-based]    [Pipeline]      [Auction-based]    [Hierarchical]    [Swarm Intelligence]
   ๐Ÿ‘ฅ             ๐Ÿญ             ๐Ÿ’ฐ             ๐Ÿข             ๐Ÿœ
 Collaborative Dialogue       Sequential Transfer       Competitive Selection       Top-down Command       Local Interaction

๐Ÿง‘๐Ÿ’ป 1. Role-based Collaboration

  • Each agent has a fixed role and responsibility.
  • Collaboration occurs through “dialogue” or “shared blackboard.”
  • Applicable Scenarios: Content creation, code review, customer service teams.

๐Ÿ“– Example: Academic Paper Writing Team

- Researcher Agent: Literature review, data organization
- Writer Agent: Drafting initial version
- Reviewer Agent: Checking logic, grammar, format
- Editor Agent: Polishing language, adjusting structure
- Publisher Agent: Generating PDF, submitting to system

โœ… Advantages: Clear responsibilities, easy to debug.โŒ Challenges: Need to design handover protocols between roles.

๐Ÿญ 2. Pipeline

  • Agents execute in a fixed order, with the output of one serving as the input for the next.
  • Similar to a factory assembly line.
  • Applicable Scenarios: Data processing, report generation, approval processes.

๐Ÿ“– Example: Market Report Generation Pipeline

[Market Research Agent] โ†’ [Data Analysis Agent] โ†’ [Report Writing Agent] โ†’ [PPT Generation Agent] โ†’ [Email Sending Agent]

โœ… Advantages: Simple structure, easy to implement.โŒ Challenges: Preceding agents can block โ†’ delays across the entire chain.

๐Ÿ’ฐ 3. Auction-based / Competitive

  • Multiple agents compete for the same task, with the lowest bidder (or highest quality) winning.
  • Theoretical Basis: Game theory + auction mechanism design.
  • Key Mechanisms
    • Incentive Compatibility: Reward mechanisms encourage agents to quote honestly.
    • Strategic Equilibrium: Avoid malicious underbidding or inflated quotes.
  • Applicable Scenarios: Resource allocation, solution selection, creative generation.

๐Ÿ“– Example: Generating Advertising Copy

User: Generate 3 promotional copies, select the best one.

โ†’ 3 Copywriter Agents generate copies simultaneously.
โ†’ Reviewer Agent scores (creativity, compliance, conversion).
โ†’ Select the highest-scoring copy, reward the corresponding agent (Token/points).
โ†’ Other copies stored in the knowledge base for future use.

โœ… Advantages: Stimulates diversity, avoids rigid thinking.โŒ Challenges: Requires fair scoring and incentive mechanisms.

๐Ÿข 4. Hierarchical

  • Mimics corporate organization: Manager Agent โ†’ Sub-agent.
  • Manager is responsible for task breakdown, resource allocation, and result acceptance.
  • Sub-agents are responsible for specific execution.
  • Applicable Scenarios: Large project management, military command, enterprise ERP.

๐Ÿ“– Example: Software Development Team

[CTO Agent] โ†’ Break down requirements โ†’ Assign to [PM Agent]
[PM Agent] โ†’ Develop plan โ†’ Assign to [Dev Agent], [Test Agent]
[Dev Agent] โ†’ Coding โ†’ Deliver to [Test Agent]
[Test Agent] โ†’ Testing โ†’ Feedback to [PM Agent]
[PM Agent] โ†’ Acceptance โ†’ Report to [CTO Agent]

โœ… Advantages: Suitable for extremely complex tasks, clear responsibility chain.โŒ Challenges: Manager Agent requires strong planning capabilities.

๐Ÿœ 5. Swarm Intelligence

  • No central control; agents achieve global goals through local interactions.
  • Inspired by ant colonies, bird flocks, and fish schools.
  • Core Mechanisms: Pheromones, local rules, positive feedback.
  • Applicable Scenarios: Robot swarms, traffic scheduling, high-frequency trading.

๐Ÿ“– Example: Logistics Path Optimization

100 logistics agents simultaneously explore delivery paths.
โ†’ Each agent records "path time" and broadcasts.
โ†’ Other agents prioritize "low time paths."
โ†’ Ultimately converge to the globally optimal path.

โœ… Advantages: High fault tolerance, adaptive, scalable.โŒ Challenges: Slow convergence, requires a large number of agents.

๐Ÿ†š Comparison Table of Five Patterns

Pattern Collaboration Method Advantages Disadvantages Applicable Scenarios
Role-based Dialogue/Blackboard Sharing Strong specialization, high fault tolerance Complex protocols, difficult debugging Content creation, code review
Pipeline Sequential Transfer Simple structure, easy to implement Blocking risk, low flexibility Report generation, approval flow
Auction-based Competition + Scoring High diversity, encourages innovation Complex mechanisms, high costs Creative generation, resource allocation
Hierarchical Top-down Command Suitable for extremely complex projects Heavy burden on Manager Enterprise management, military command
Swarm Intelligence Local Interaction High fault tolerance, adaptive Slow convergence, difficult to control Robotics, traffic scheduling

๐Ÿงญ Selection Recommendations:

  • Beginners โ†’ Pipeline
  • Professional Scenarios โ†’ Role-based / Hierarchical
  • Creative/Selection Scenarios โ†’ Auction-based
  • Physical/Financial Scenarios โ†’ Swarm Intelligence

3. Communication Protocols Between Agents: Message Format, Dialogue Management, State Synchronization

The core of multi-agent collaboration is communication โ€” how do they “talk”? How do they “synchronize states”? How do they “avoid conflicts”?

๐Ÿ–ผ๏ธ [Suggested Image Location] Multi-Agent Communication Architecture Diagram

[Agent A] โ†” [Message Broker] โ†” [Agent B]
           โ†•
     [Shared Blackboard]
           โ†•
      [Event Bus]

๐Ÿ“จ 1. Message Format

Communication between agents must be structured; a standard format is recommended:

{
  "from":"researcher_agent",
  "to":"writer_agent",
  "content":{
      "task_id":"TASK20250601",
      "data":{ ... },
      "metadata":{
        "priority":"high",
        "deadline":"2025-06-02T10:00:00Z"
      }
  },
  "timestamp":"2025-06-01T09:30:00Z"
}

โœ… Key Fields: from/to, content, metadata, timestamp.โŒ Avoid: pure natural language communication (prone to ambiguity, hard to parse).

๐Ÿ’ฌ 2. Dialogue Management

  • Dialogue ID: Each task has a unique ID for tracking.
  • Dialogue Status: pending / running / completed / failed.
  • Timeout Mechanism: If an agent does not respond โ†’ automatically retry or escalate.
class ConversationManager:
    def __init__(self):
        self.conversations = {}  # task_id โ†’ {status, messages, timeout}

    def send_message(self, task_id, from_agent, to_agent, content):
        # Record message + update status
        pass

    def check_timeout(self, task_id):
        # Trigger retry or alert on timeout
        pass

๐Ÿ”„ 3. State Synchronization

  • Shared Blackboard: A central storage that all agents can read and write.
  • Event-driven: Agents publish events, others subscribe.
  • Database Synchronization: Use Redis / PostgreSQL to store shared states.

๐Ÿ›ก๏ธ Security and Robustness Enhancement (New):

Risk Defense Strategy
Malicious Agent Injection Role whitelist + digital signature + behavior auditing.
Data Pollution Input validation + output verification + version rollback.
Role Conflict Priority arbitration + locking mechanism + timeout preemption.
Communication Hijacking TLS encryption + message authentication + access control.

โš ๏ธ Common Communication Issues and Responses

Issue Manifestation Solution
Message Loss Agent does not receive instructions. ACK confirmation + retransmission mechanism.
Deadlock A waits for B, B waits for A. Set timeout + priority arbitration.
State Inconsistency Two agents use different data. Central state management + versioning.
High Communication Costs Frequent message passing consumes tokens. Batch sending + local caching.

4. Practical Example: Building a Multi-Agent Pipeline for “Market Research โ†’ Report Writing โ†’ PPT Generation”

Next, we will implement a complete multi-agent pipeline using LangChain + LLM.

๐ŸŽฏ Goal: User inputs industry โ†’ Automatically generate market analysis report + PPT.

๐ŸŒ Extended Cases: Cross-Domain Multi-Agent Applications (New)

๐Ÿงช Case 1: Research Collaboration Agents (Papers + Experiments + Graphics)

[Literature Review Agent] โ†’ [Experiment Design Agent] โ†’ [Data Simulation Agent] โ†’ [Graphics Agent] โ†’ [Paper Writing Agent]

๐Ÿ“ˆ Case 2: Financial Trading Agents (Analysis + Decision + Execution)

[Macro Analysis Agent] โ†’ [Industry Research Agent] โ†’ [Quantitative Strategy Agent] โ†’ [Trading Execution Agent] โ†’ [Risk Control Agent]

๐Ÿค– Case 3: Robot Swarm Agents (Exploration + Collaboration + Obstacle Avoidance)

[Leader Robot] โ†’ Assign Areas โ†’ [Scout Robot] Explores โ†’ [Mapper Robot] Maps โ†’ [Carrier Robot] Transports

๐Ÿงฉ System Architecture (Market Report Case)

[User] 
   โ†“
[Coordinator Agent] โ†’ Assign tasks, monitor progress
   โ†“
[Market Research Agent] โ†’ Call search engines/RAG, output data
   โ†“
[Report Writing Agent] โ†’ Generate Markdown report based on data
   โ†“
[PPT Generation Agent] โ†’ Convert report to PPT (using python-pptx)
   โ†“
[Delivery Agent] โ†’ Save file + notify user

๐Ÿ› ๏ธ Code Implementation (Simplified Version, Core Logic)

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import asyncio

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)

# 1. Market Research Agent
async def market_research_agent(industry: str) -> str:
    prompt = f"Please research the {industry} industry, output: market size, growth rate, Top 3 companies, trend analysis."
    response = await llm.ainvoke([HumanMessage(content=prompt)])
    return response.content

# 2. Report Writing Agent
async def report_writer_agent(data: str) -> str:
    prompt = f"Write a report based on the following data:\n{data}\nRequirements: clear structure, including title, abstract, body, conclusion."
    response = await llm.ainvoke([HumanMessage(content=prompt)])
    return response.content

# 3. PPT Generation Agent (Simplified: Generate PPT Outline)
async def ppt_agent(report: str) -> str:
    prompt = f"Convert the following report into a PPT outline, one title + 3 key points per slide:\n{report}"
    response = await llm.ainvoke([HumanMessage(content=prompt)])
    return response.content

# 4. Main Process (Pipeline)
async def multi_agent_pipeline(industry: str):
    print("๐Ÿ” Step 1: Market Research...")
    data = await market_research_agent(industry)
    
    print("๐Ÿ“ Step 2: Writing Report...")
    report = await report_writer_agent(data)
    
    print("๐Ÿ“Š Step 3: Generating PPT...")
    ppt = await ppt_agent(report)
    
    print("โœ… Done!")
    return {"data": data, "report": report, "ppt": ppt}

# Run
result = asyncio.run(multi_agent_pipeline("New Energy Vehicles"))
print(result["ppt"][:500])  # Print the first 500 characters of the PPT outline

๐Ÿ–จ๏ธ Output Example:

PPT Outline:
Slide 1: Analysis of the New Energy Vehicle Industry
- Market Size: ยฅ50 billion
- Growth Rate: 12%
- Trends: Intelligence, Globalization

Slide 2: Top 3 Competitors
- Company A: Market Share 30%
- Company B: Market Share 25%
- Company C: Market Share 20%
...

โš™๏ธ Engineering Optimization Suggestions

  • Parallelization: Research + Data Analysis can be parallelized.
  • Cache: Cache results of the same industry research for 1 day.
  • Fault Tolerance: If any agent fails โ†’ retry 2 times โ†’ escalate to human.
  • Cost Control: Use gpt-3.5 to generate drafts, gpt-4 for polishing.

5. Evaluation and Challenges of Multi-Agent Systems

How to measure whether a multi-agent system is “usable”? It needs to be evaluated from four dimensions:

๐Ÿ“Š Evaluation Index System

Dimension Index Description Measurement Method
Effectiveness Task Success Rate Whether the final output meets user needs. Manual evaluation / Automated testing.
Output Accuracy Whether data/conclusions are accurate. Comparison with standard answers.
Consistency Whether results are stable across multiple runs. Variance calculation.
Efficiency Latency Total time from input to output. Timer.
Throughput Number of tasks processed per unit time. Stress testing.
Parallelism Number of concurrently active agents / Total number of agents. Monitoring logs.
Cost Total Token Consumption Total tokens used by all agents calling LLM. API billing statistics.
Computational Resource Consumption CPU/GPU/Memory usage. System monitoring.
Robustness Fault Tolerance Rate Proportion of successful system recovery after agent failure. Fault injection testing.
Recovery Time from Anomaly Time from failure to normal recovery. Log analysis.
Success Rate of Defense Against Adversarial Attacks Success rate of resisting malicious inputs/role injections. Red team testing.

๐Ÿ“Œ Recommendation: Conduct “fault injection testing” and “stress testing” before going live to ensure stability in the production environment.

โœ… Summary of This Chapter

  • Multi-agent systems originate from distributed AI, game theory, and organizational modeling, and have gained traction in recent years due to the explosion of large models.
  • Five architectural patterns: role-based, pipeline, auction-based (including game theory), hierarchical, and swarm intelligence.
  • Communication protocols are core: structured messages, dialogue state management, synchronization of shared data, and defense against malicious attacks are essential.
  • Leading frameworks: AutoGen, ChatDev, MetaGPT, CAMEL, AgentScope each have applicable scenarios.
  • Evaluation requires four dimensions: effectiveness (success rate/accuracy), efficiency (latency/throughput), cost (tokens/resources), and robustness (fault tolerance/attack resistance).
  • Applications extend beyond content generation, empowering research, finance, robotics, and industrial sectors.
  • More is not always better: the number of agents should balance complexity and benefits, with 3-5 being the sweet spot.

Leave a Comment