3.3 LlamaIndex
Compared to the comprehensive approach of langchain, Llama takes a more elegant and compact path. It particularly focuses on how to develop advanced AI-based RAG technology and the establishment of multi-tenant RAG systems. The enterprise solutions based on LlamaIndex aim to eliminate technical and security barriers, enhancing enterprises’ data usage and service capabilities. The work related to LlamaIndex not only focuses on technical development but also involves applying these technologies to real-world scenarios to improve business efficiency and customer experience.
3.3.1 LlamaIndex and AI Development Based on RAG
RAG is an important direction in AI application development, representing a machine learning method that combines retrieval and generation. It first retrieves information from relevant data sources, then incorporates this information as context into the user’s query, and finally requests the large model to generate answers based on this enriched prompt.
The training of large models progresses from difficult to easy as follows: building models from scratch, fine-tuning models, dynamic prompting (RAG is one of these), and simple prompt engineering.
RAG is positioned in the middle of the difficulty chain for large model applications; it is neither as difficult as building a model from scratch or fine-tuning a model, nor as easy as directly querying a large model through simple prompt engineering. It overcomes three disadvantages of directly fine-tuning models—high cost, difficulty in updating information, and lack of observability. In contrast, RAG has lower costs since it does not require training; it always has the latest data because it retrieves it in real-time; and it provides more credible results as it can display the retrieved documents.
3.3.2 LlamaIndex
Returning to LlamaIndex, it provides various frameworks, tools, and patterns that support throughput, structured data, and access to private or specific domain data, making the RAG development process smoother. For example:
1. Data connectors: These ingest existing data from its original sources and formats. Data can be in various forms such as API, PDF, SQL, etc., and LlamaIndex has corresponding reading interfaces.
2. Data indexing: This structures data into intermediate representations that are easier for large models to understand, such as word vectors.
3. Engines: These provide natural language access to data. For example, the query engine is a powerful retrieval interface used to enhance knowledge output; the chat engine is a conversational interface for multi-message interaction with data; and the data agent is a knowledge worker driven by large models, capable of functions ranging from simple assistance to API integration.
4. Application integration: This reintegrates LlamaIndex into other ecosystems.
The two major AI application hotspots, RAG and Agents, are closely related. LlamaIndex can provide more functionality for Agents through RAG pipelines, frameworks, and tools, addressing the pain points of lack of control and transparency in Agents. The Agent API in LlamaIndex allows for step-by-step execution, enabling Agents to handle more complex tasks. Additionally, LlamaIndex supports user feedback during the RAG loop, which is particularly suitable for executing long-term tasks, achieving a dual-loop setup for user-Agent interaction and intermediate execution control.
3.3.3 Main Process of RAG
It can be roughly divided into six steps:
1. User submits a query: The user poses a query to the system, such as a question or request.
2. Agent searches for relevant information: The Agent searches for relevant information based on the user’s query, possibly through the internet or specific databases, to find related documents or data. Typically, we would place internal corporate information into a vector database.
3. Retrieval of information: Specific information is retrieved from the search results, which will be used to generate context for responding to the user’s query.
4. Relevant information is provided to the large model: The Agent supplies the retrieved information along with the user’s original query to the large model.
5. Large model generates a response: The large model uses this information to generate a rich, informative answer.
6. Responding to the user’s request: Finally, the LangChain Agent provides the answer generated by the large model to the user. This answer is based on the user’s original query and the information retrieved from relevant data sources.
The above process can form a loop, where each user query is used to improve subsequent interactions.
3.3.4 Simple LlamaIndex Example
First, install the related libraries using pip install LlamaIndex.
Additionally, it should be noted that Llama integrates well with OPENAI’s interface, but some adjustments are needed when calling products from other vendors.
Here, we will use a PDF file stored in the data directory as the corpus document.
# Import environment variables, here we still use KIMI's import os
from openai import OpenAI
os.environ['KIMI_API_KEY'] = 'your_kimi_api_key'
client = OpenAI( api_key=os.getenv('KIMI_API_KEY'), # Use environment variable to get API_KEY base_url="https://api.moonshot.cn/v1")
# Import documents, load local data
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
It should be noted that if you are using OPENAI’s API key, you can directly use the following code to create an index for the data:
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
If you are using an API key from another vendor, you need to check if Llama supports that vendor’s products and whether the API key includes an embedding model. If it does not, you need to specify a local or online embedding model; otherwise, it will default to calling OPENAI’s embedding model. Then, use the specified embedding model to create the index.
# Configure HuggingFaceEmbeddings embedding model, here we use BAAI/bge-small-zh-v1.5
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
Settings.embed_model = HuggingFaceEmbeddings(model_name="BAAI/bge-small-zh-v1.5")
# Query local data
# Create query engine
query_engine = index.as_query_engine()
# Initiate questions
# Two query examples
response = agent.query("How many roles are there in the Huayu Secret Realm?")
print("How many roles are there in the Huayu Secret Realm?", response)
response = agent.query("What is the name of the Agent in the Huayu Secret Realm?")
print("What is the name of the Agent in the Huayu Secret Realm?",response)
Summary
The above is a simple example to experience the practicality of LlamaIndex.
Of course, I used KIMI’s API key here, which does not include an embedding model, so the above code cannot run directly and will produce various errors.
Theoretically, finding an embedding model and using Hugging’s API key or downloading it locally can solve the problem. Due to local computer performance limitations, I won’t demonstrate the specific operations.
Declaration:
This series of articles mainly references the book “Developing AI Agents with Large Models” published by People’s Posts and Telecommunications Press.
The images, codes, and other content mentioned in the text are mainly sourced from the original text of the book.