AGDebugger: A Powerful Tool for Developing, Debugging, and Guiding Multi-Agent Systems

Click 👇🏻 to follow, the article comes from

🙋♂️ Friends who want to join the community can see the method at the end of the article for group communication.

“ In the current rapid development of AI technology, multi-agent AI systems are gradually becoming a popular choice for solving complex tasks. However, this also brings many challenges when debugging these systems. AGDebugger was born in this context, providing developers with a new, interactive debugging method that makes the behavior of multi-agent AI systems more transparent and controllable.”

Hello everyone, I am Si Ling Qi. I found an interesting project called AGDebugger, which is an open-source project by Microsoft mainly used for debugging multi-agent AI systems. For example, in a scenario where a group of agents collaborates like a small team to complete tasks, if something goes wrong, how do you find the problem and fix it? This is where AGDebugger comes in handy, allowing developers to clearly see how the agents work step by step and adjust their behavior at any time.

The Rise of Multi-Agent AI Systems and Debugging Challenges

Imagine you have a group of smart friends, each with their own specialties, such as some are good at searching the internet for information, some can write code, and others can handle files. Organizing these friends can accomplish particularly complex tasks, such as finding the latest data, analyzing file contents, or executing a series of actions. This is how multi-agent AI systems work; they collaborate through division of labor, using large language models (LLMs) to plan and make decisions, completing tasks step by step.

Debugging multi-agent AI systems requires reasoning and analysis of lengthy multi-turn dialogues, where specialized agents use tools such as web browsing and code writing with the help of large language models. AGDebugger enables users to interactively debug and guide multi-agent teams to solve problems, allowing users to reset agents to earlier nodes in the workflow and edit information to test hypotheses about their behavior interactively.

But the problem arises when these agents start working and have many rounds of back-and-forth communication; if something goes wrong in the middle, such as an agent finding the wrong information or executing an incorrect plan, how can developers find the root of the problem? Traditional debugging methods, such as checking model training or correcting datasets, become insufficient. This is because the key to multi-agent systems lies in their interactions and how they complete tasks through multiple calls to language models. This requires a brand new debugging tool that can help developers understand the dialogues between agents, locate problems, and adjust their behavior in real-time.

AGDebugger: A New Tool for Debugging Multi-Agent AI

AGDebugger is designed to address these issues. It has three particularly powerful features that make debugging multi-agent systems much simpler.

• AGDebugger allows users to send new messages to agents at any time and view all messages exchanged between them. It’s like being able to interject at any time or review previous conversations to see how the agents arrived at their current state. For example, you can pause the agents’ work, send them new instructions, or see what they said before, making it easy to identify where the problem lies.

AGDebugger helps users debug and guide their agent teams interactively. Users can send new messages interactively, control the flow of messages, and view the history of agent messages.

• The most impressive feature of AGDebugger is that it allows users to “rewind” the agents’ dialogues back to a previous point in time and modify the messages there. It’s like being able to go back in time, change an agent’s decision, and see what happens next. For example, if you find that an agent’s plan is not detailed enough, you can go back to when it made the plan, add more specific instructions, and then rerun to see the results.

Users debug agent workflows by directly editing previous agent messages and restarting the workflow from that point. For example, they would add more specific instructions in the messages to guide the agents towards the correct outcome.

• AGDebugger also features an intuitive visual interface that displays the entire dialogue history and editing history. It’s like having a map that clearly shows how the agents have progressed and where you made modifications. For instance, you can see when each message was sent, who sent it, and where you made changes, making it easy to track the entire flow of the dialogue.

The interactive overview visually summarizes the content of agent dialogues. Each reset branches the current dialogue and creates a new dialogue session presented in a new column. Users can switch message colors to indicate message types, senders, or receivers. Hovering shows message details, and clicking navigates to the full message in the message history view.

How to Use AGDebugger

AGDebugger is not only powerful but also easy to use. It is an open-source project, and you can install and use it locally by following these steps.

First, you need to clone the AGDebugger code repository from GitHub and install the relevant Python packages. The specific steps are as follows:

# Clone the repository
git clone https://github.com/microsoft/agdebugger.git
cd agdebugger

# Install frontend dependencies
cd frontend
npm install
npm run build

# Install Python packages
cd ..
pip install .

Once the installation is complete, you can use AGDebugger to debug your multi-agent system. AGDebugger is built on AutoGen, and you need to provide a Python file that exposes a function to create an AutoGen AgentChat team for debugging. For example, here is a simple script that creates a team with a single WebSurfer agent:

# scenario.py
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def get_agent_team():
    model_client = OpenAIChatCompletionClient(model="gpt-4o")

    surfer = MultimodalWebSurfer(
        "WebSurfer",
        model_client=model_client,
    )
    team = MagenticOneGroupChat([surfer], model_client=model_client)

    return team

Then, you can start the AGDebugger interface with the following command:

agdebugger scenario:get_agent_team

Once in the interface, you can send a <span><span>GroupChatStart</span></span> message to start the agent dialogue and begin debugging!

User Research: The Actual Performance of AGDebugger

To see how powerful AGDebugger is, researchers conducted two parts of user research. In the first part, 6 participants used AGDebugger to find two errors in the running of failed agents. The results showed that AGDebugger was indeed very useful, as participants were able to find the problems faster, and most felt that AGDebugger was much easier to use than traditional debugging methods.

In the second part of the study, 8 participants used AGDebugger to try to guide the agents to output correct answers. Although it was somewhat difficult to get the agents to output completely correct answers directly, AGDebugger helped them better understand the agents’ behavior patterns. For example, some participants found that the agents’ plans were not detailed enough, so they went back to the planning stage and added more specific instructions. Others found that the agents’ tasks were too complex, so they simplified the tasks, allowing the agents to first complete a simple sub-task. Through these methods, participants successfully guided the agents to the correct results.

In the second part of the user research, each participant used the message editing feature to assist them in debugging, with some participants even editing messages five times individually. The most common editing operation was adding more specific instructions to messages, followed by simplifying instructions and modifying the frequency of plan goals.

Improvements for AGDebugger

Although AGDebugger is already quite useful, there are still some areas for improvement. For example, some agent actions are irreversible, such as sending an email that cannot be recalled, which limits AGDebugger’s “rewind” feature. Additionally, to effectively guide the agents, developers need to have a deep understanding of the implementation details of the agents; otherwise, they may issue instructions that the agents cannot execute. Furthermore, sometimes after modifying a message, the changes in agent behavior may not be obvious, requiring developers to try multiple times to determine if the modification was effective.

Conclusion

AGDebugger provides an innovative solution for debugging and guiding multi-agent AI systems. By allowing users to edit messages exchanged between agents and explore counterfactual scenarios, AGDebugger helps developers gain deep insights into the collective behavior of agents and effectively adjust their interactions. As AI technology continues to evolve, interactive debugging tools like AGDebugger will play an increasingly important role in building more powerful and reliable multi-agent AI systems.

After reading this article, what are your thoughts? Feel free to leave a comment, and let’s discuss together. If you are already using the AutoGen framework to build agents, then AGDebugger can be used directly; if you are using other frameworks to build your own agent applications, and iterative development and operation of agents are part of your work, then the open-source code of AGDebugger (see references at the end) is valuable and can greatly facilitate our debugging experience with agent applications. Finally, feel free to join the “Awareness Flow” community to communicate and discuss with fellow members! To join, follow the “Awareness Flow” public account, click the “Community” menu, and scan the code to enter.