Exploration of AI Agent Technology

1. BackgroundRecently, I have been learning some knowledge related to AI Agents. The purpose of writing this article is to clarify the context of the knowledge I have acquired, making it easier for deeper learning in the future.2. What is LLMLLM (Large Language Model) technology has developed rapidly in recent years, and there are now many large models available on the market, most of which are open-source.2.1 Principle of LLM ImplementationThe principle of LLM is based on neural network technology in deep learning, combined with the Transformer architecture. It learns the statistical laws of language by training on massive text data. Ultimately, it predicts the next word based on statistical probabilities.2.2 Core Capabilities of LLMToday’s large models are like knowledgeable experts aware of various industries. People ask questions to the large model, and it provides answers. The core function of the large model is to add one word at a time, generating a “reasonable continuation” for any input text. The term “reasonable” means that after considering content from billions of web pages, one might expect others to write in a similar manner. From the perspective of developing applications for large models, this can be summarized in one sentence: the core capability of the large model is to add one Token at a time.3. How to Use LLMImagine if we want to get an answer to a question from a knowledgeable professor. What should we do? We need to clearly describe the question so that the professor can understand it and provide the correct answer. This process may require multiple rounds of dialogue to ensure the professor fully understands your question.3.1 What is PEHow can we clearly describe a specific question? PE (Prompt Engineering) is used to solve this problem. PE breaks down the process of asking a question into several parts: defining roles, describing the background of the question, identifying the problem to be solved, and specifying output requirements.PE organizes relevant information for each specific question and then systematically asks questions. The characteristics of PE are static and one-time, optimizing questioning techniques, clarifying instructions, and providing examples to obtain high-quality outputs.3.2 What is ICLICL (In-Context Learning) technology is a core capability of LLM, utilizing provided prompt information (i.e., context) to give feedback based on the model’s own capabilities. The quality of the feedback highly depends on the provided context information. PE obtains high-quality outputs by providing the large model with well-organized, high-quality context information. There are two common methods: one is zero-shot learning, which does not provide any task examples; the other is few-shot learning, which provides a small number of task examples in the prompt to guide the model in executing new tasks.3.3 What is CECE (Context Engineering) technology is about dynamically constructing context. While PE is static and one-time, we often want to provide different contexts to the large model based on different situations, making the PE process dynamic. Sometimes, it requires multiple rounds of dialogue based on feedback from the large model to achieve high-quality outputs. To implement dynamic context, the following key technologies are needed:3.3.1 Retrieval-Augmented Generation (RAG)When a user asks a question to the large model, it retrieves the most relevant information segments from external knowledge bases (vector databases, internal document libraries, etc.) in real-time. The retrieved information is then dynamically injected into the context, providing the model with accurate and clear background knowledge, thus enabling it to produce higher quality outputs.3.3.2 Tool Calling / Function CallingThe user provides tools that allow the large model to access external information, clearly defining inputs and outputs. When the large model identifies that answering the user’s question requires using the results provided by the tool, it dynamically calls the tool, injecting the input requests and returned results into the context, greatly enhancing the model’s capabilities.3.3.3 Agent MemoryBy storing historical interaction data between the user and the large model, when asking new questions, the historical data is injected into the model’s context, ensuring coherent responses. Agent memory has two forms: short-term memory, which manages the state, intermediate results, and user intentions in the current session/task chain; and long-term memory, which persistently stores key information, user preferences, and task knowledge from the session.3.4 Comparison of PE and CE

Dimension	PE	CE
Goal	Optimize single instructions to guide model output	Dynamic construction of task environments, injecting multi-source information
Key Technologies	Instruction design, example optimization	RAG retrieval, memory management, tool calling, structured output
Applicable Scenarios	Simple Q&A, creative generation	Long-term tasks
Failure Attribution	Adjust questioning methods	Check data freshness, tool availability, memory coverage

4. What is MCPMCP (Model Context Protocol) is a standardized interface designed to facilitate the use of external resources (mainly tools) by large models, aiming to break down barriers between models and external data and tools.You can think of it as a USB-C port specifically designed for AI applications, plug-and-play. All the cumbersome tasks are handled by the MCP server according to the protocol. MCP standardizes Tool Calling and Function Calling, aiming to enhance the capabilities of individual AI Agents.5. What is A2AA2A (Agent-to-Agent Protocol) is a protocol launched by Google in April 2025, aimed at enabling direct collaboration, dialogue, and collaborative decision-making between different AI agents. The core goal of the A2A protocol is to address collaboration issues among heterogeneous AI agents, allowing these agents to perform task allocation, progress synchronization, and even multimodal interactions across platforms, frameworks, and vendors.The A2A protocol complements MCP. MCP primarily addresses how AI models access external tools and data, while A2A focuses on direct dialogue and cooperation between AI agents.6. References

https://mp.weixin.qq.com/s/nS095lfGEjXmiEUibBzcHw

Author

Xiong Ping, Apache Dubbo Committer, AWS Community Builder, Open Atom School Source Open Source Ambassador & Open Source Lecturer, GitHub account: pinxiong, WeChat public account: Technical Exchange Cabin, a learner of AI Agent technology.

Related posts

Leave a Comment Cancel reply