Dexto is Here! The Secret Weapon for Making AI Agents Speak

Have you ever thought that one day your AI assistant could not only answer questions but also actively help you write code, edit images, send emails, and even browse the internet for information and make decisions? Not just a mechanical conversation of “you say one thing, it replies another,” but like a real “digital colleague”—with memory, judgment, execution capabilities, and the ability to discuss the next steps with you.

Today, I will take you deep into this magical tool known as the “AI Intelligent Layer” and see how it transforms a bunch of cold, hard models, tools, and data into a living AI agent system that can “think,” “act,” and “speak.”

1. From “Chatbot” to “Digital Life”: The Evolution of AI Agents

Let’s start with some background.

In recent years, large language models (LLMs) have become incredibly popular. You may have heard names like ChatGPT, Claude, and Gemini. What makes them powerful? It’s their ability to “understand language” and “generate content.” You can ask them, “Help me write a resignation letter,” and they can produce a beautifully crafted document in seconds.

But the problem is—they can only talk, not act.

If you want them to “optimize the Python script I wrote last week and push it to GitHub,” they can at most tell you how to do it, but you still have to type the commands, modify the code, and submit the PR yourself. The efficiency doesn’t improve much; instead, you just have an “AI coach”.

It’s like hiring a super-smart consultant who can talk strategy but won’t even touch the keyboard.

Thus, people began to wonder: can we make AI not just a “talking head” but a true “executor” that can “get things done”?

This is the origin of the AI Agent.

What is an AI Agent?

In simple terms, an AI Agent is an entity that can:

Perceive the environment (e.g., read files, browse web pages)
Make decisions (e.g., decide whether to research or write code first)
Execute actions (e.g., call APIs, modify files, send messages)
Learn and remember continuously (e.g., remember what you said or did last time)

It is no longer a passive answering machine but an active task executor.

Dexto is specifically designed as an “operating system-level” framework for building such AI agents.

2. What Exactly is Dexto? In a Nutshell

Dexto is a universal intelligent layer that allows you to seamlessly connect large models, tools, and data to quickly build AI applications that can “think,” “act,” and “chat” with you.

Sounds a bit abstract? Don’t worry, let me give you an analogy.

Imagine you want to create an “all-in-one AI assistant” that can:

Write code
Create images
Query databases
Play music
Search for information online
Generate podcasts for you

The traditional approach would require you to integrate APIs one by one, write a bunch of glue code, set up a backend service, and then build a frontend interface… by the time you finish, the opportunity has passed.

With Dexto, you only need to:

Tell it: “I want an AI that can draw”
It automatically loads the image generation tool and the corresponding large model (like Gemini Flash)
Then you input: “Draw a futuristic flying car,” and it can directly generate and display the image for you

The entire process,requires not a single line of code, just configuration files to handle everything.

Isn’t it a bit like “LEGO blocks for the AI world”?

3. Core Highlights: What Makes Dexto So Powerful?

Let’s take a look at what its official slogan says:

“An all-in-one toolkit to build agentic applications that turn natural language into real-world actions.”

In translation, this means:an out-of-the-box toolkit that turns natural language into real-world actions.

This statement is packed with information, so let’s break down the technical support behind it.

✅ 1. Configuration-Driven Development: YAML Defines Everything, Say Goodbye to Hardcoding

One of Dexto’s most impressive designs isdefining agent behavior using YAML files.

What does this mean? It means you don’t have to write JavaScript/Python to control logic; instead, you use a structured configuration file to tell the agent:

Which large model to use?
What tools can be accessed?
How to manage memory?
How to handle user dialogues?

For example, here is a snippet of a configuration for an agent:

agent:
  name: coding-agent
model: openai/gpt-4o
tools:
    - filesystem
    - browser
    - terminal
memory:
    type: redis
    ttl: 86400
prompt: |
    You are a professional front-end engineer,
    skilled in using HTML/CSS/JS to build responsive pages.
    When users present requirements, please plan and execute step by step.

See? Not a single line of code! It’s all declarative configuration.

What does this mean? It means you can:

Quickly switch models (changing from GPT to Claude requires just one line)
Dynamically add or remove tools (want to add a PDF reader? Just add it)
Reuse agent templates (team-shared configurations, unified standards)

This is simply the “microservices architecture of the AI era.”

✅ 2. Runtime Engine: Making Agents Truly “Live”

Having configuration alone is not enough; a powerful runtime is needed to execute these instructions.

Dexto provides a complete runtime that includes the following key capabilities:

Function	Description
Session Management	Supports multi-user, multi-session concurrency, with independent context for each session
Memory System	Automatically saves conversation history, intermediate states, and user preferences
Tool Orchestration	Multiple tools can be called sequentially or in parallel, with error retry support
Multi-Modal Support	Can handle text, images, files, and voice
Human-Machine Collaboration	Key operations can be set for manual approval to prevent AI from going rogue

For example, in a practical scenario:

You say: “Help me create a Snake game, and open the browser for preview once done.”

What will Dexto do?

Understand your intent → Plan task steps
Call the code generation model → Output HTML+CSS+JS code
Use filesystem tool → Write the code into a local file
Call browser tool → Automatically launch the browser to open the page
Reply to you: “Done! The game is open in the browser”

The entire process is fully automated, and each step has log tracking, so if something goes wrong, you can troubleshoot.

This kind of “end-to-end automation” is the true meaning of an “intelligent agent.”

✅ 3. Rich Interfaces: CLI, Web, API at Your Choice

Dexto is not only suitable for technical personnel but also caters to different user habits.

It provides three main interaction modes:

🖥️ Web UI: Visual operation, easy for beginners

Start command:

dexto

This will automatically open a web interface where you can:

View historical conversations
Switch between different agents
Monitor token consumption in real-time
Test new prompt effects

Especially suitable for debugging and demonstrations.

💻 CLI Mode: A Programmer’s Favorite, Chat Directly in the Terminal

dexto --mode cli

Enter command-line interactive mode, suitable for embedding in workflows and automation scripts.

For example, if you suddenly want to refactor while coding, just type:

dexto "Refactor this module into three functions and add TypeScript types"

It will help you complete it.

🔌 API Interface: Easily Integrate into Existing Systems

Dexto also provides RESTful API and TypeScript SDK, making it easy to integrate into your app, website, or customer service system.

For example, if you want to create a “document Q&A bot,” you can:

Upload a PDF
Call the Dexto API to analyze the content
Automatically retrieve and answer when users ask questions

Seamlessly integrate into your product.

✅ 4. Out-of-the-Box “Agent Recipes”: Ready to Use

The most considerate feature is that Dexto comes with a bunch of pre-set “Agent Recipes,” equivalent to ready-made AI employee templates.

Currently, there are over 10 types of agents that can be installed and used directly:

Agent Name	Capability Overview
Coding Agent	Complete code writing, refactoring, and debugging
Nano Banana Agent	Image generation and editing (based on Gemini Flash)
Podcast Agent	Generate dual-dialogue podcast audio
Sora Video Agent	Call OpenAI Sora to generate videos (requires permission)
Database Agent	Execute SQL queries, analyze data
GitHub Agent	Create PRs, review code, manage repositories
Talk2PDF Agent	Free Q&A after uploading PDF
Music Agent	AI composition, audio processing
Triage Agent	Multiple agents collaboratively handle customer issues

Installation is also extremely simple:

# View available agents
dexto list-agents

# Install a few common ones
dexto install coding-agent podcast-agent talk2pdf-agent

# Use directly
dexto --agent coding-agent "Write a React login component"

In just a few minutes, you can have several “AI expert teams” at your disposal.

4. Technical Architecture Analysis: How Does Dexto Work?

Talking about features is not enough; let’s dive into the underlying structure and see what Dexto’s “guts” look like.

🧱 Overall Architecture Diagram (Simplified)

+---------------------+
|     User Interface    | ← CLI / Web / API
+----------+----------+
           |
           v
+-----------------------+
|    Dexto Runtime      | ← Core Engine
| - Orchestration       |
| - Session Management  |
| - Memory & Context    |
| - Tool Invocation     |
+----------+-----------+
           |
     +-----+-----+
     |           |
     v           v
+------------+ +------------------+
|   LLMs     | |     Tools        |
| (GPT,      | | (Filesystem,     |
| Claude,    | |  Browser, APIs)  |
| Gemini...) | +------------------+
+------------+
     |
     v
+------------------+
|   Storage Layer   | ← Redis, SQLite, S3...
+------------------+

The entire system is divided into four layers:

First Layer: User Interface Layer (Input)

Supports various input methods:

CLI: Command-line interaction
Web UI: Graphical interface
HTTP API: Programmatic calls
SDK: TypeScript integration

No matter which method, they will ultimately be converted into a unified request format sent to the core engine.

Second Layer: Runtime Engine (The Brain)

This is the core part of Dexto, responsible for coordinating all components.

It mainly includes the following modules:

1. Orchestrator

Function: Receives user input → Analyzes intent → Breaks down tasks → Schedules tools → Aggregates results

For example, if you say: “Convert this blog into a podcast and publish it on Xiaoyuzhou.”

It will automatically break it down into:

Call read_file tool to read the blog content
Call podcast-agent to generate dual-dialogue audio
Call upload_to_xiaoyuzhou API to publish the program
Return the link to you

The entire process requires no human intervention.

2. Session Manager

Each user has an independent session ID that saves:

The current conversation context
Records of tools used
User personalized settings
Temporary variables

This way, when you come back next time, the AI remembers what you said before.

3. Tool Gateway

All external capabilities are exposed through “tools”.

Each tool is a plugin that follows a unified interface:

interface Tool {
  name: string;
  description: string;
  parameters: JSONSchema;
  execute(input: any): Promise<any>;
}

Currently, there are over 30 built-in tools, including:

File system read/write
Browser control
Terminal command execution
HTTP client
Database connection
PDF parsing
Image processing
Audio synthesis
…

You can also connect external services via the MCP protocol (to be detailed later).

4. Memory System

Supports various storage backends:

In-memory (for development testing)
Redis (high-performance caching)
PostgreSQL (persistent storage)
SQLite (lightweight local storage)
S3 (object storage)

You can choose freely based on the scenario.

For example, in a production environment, use a combination of Redis and PostgreSQL for speed and stability; for local development, SQLite is sufficient.

Third Layer: Model and Tool Layer (Execution)

This is the true “hands-on layer”.

🤖 Supports 50+ Large Models

Dexto does not bind to any specific vendor; instead, it acts like a “model supermarket,” supporting almost all mainstream LLMs:

Type	Examples
Cloud Service Providers	OpenAI, Anthropic, Google, Groq, Azure
Open Source Models	Llama 3, Mistral, Qwen, DeepSeek
On-Premise Deployment	Ollama, LM Studio, Text Generation WebUI

Switching models requires just one line of configuration:

model: anthropic/claude-3-opus
# vs
model: ollama/llama3:70b

No need to change the prompt template.

🔧 Tools Plug and Play

All tools can be dynamically loaded.

For example, if you want the agent to have the ability to “send WeChat messages,” just register a new tool:

registerTool({
  name: 'send_wechat_message',
  description: 'Send WeChat messages to specified contacts',
  parameters: {
    type: 'object',
    properties: {
      contact: { type: 'string' },
      message: { type: 'string' }
    }
  },
async execute({ contact, message }) {
    return await wechatApi.send(contact, message);
  }
});

Once registered, it can be used in YAML.

Fourth Layer: Observability and Safety

As a production-grade framework, Dexto has put significant effort into observability and safety.

📊 Built-in OpenTelemetry Support

All requests will be automatically traced, supporting:

Distributed tracing
Metrics monitoring
Log collection

You can use tools like Jaeger, Prometheus, and Grafana for real-time monitoring:

Time taken for each call
Token usage
Tool invocation chains
Error rate statistics

This is very useful for debugging complex agent processes.

🔐 Human-in-the-Loop Mechanism

To prevent AI from “acting on its own” and causing issues, Dexto provides flexible approval policies.

For example, you can set:

All operations involving “deleting files” must be manually confirmed
Prompt reminders before “modifying the database”
Some tools are disabled by default and require explicit authorization

You can also enable --auto-approve mode for rapid iteration:

dexto --auto-approve "Refactor the entire project directory structure"

It’s great for development, and you can turn it off before going live.

5. Practical Demonstration: Create an AI Drawing Assistant in Three Steps

Having covered the theory, let’s get into some practical work.

Goal: Use Dexto to build an AI drawing assistant that supports generating images from Chinese descriptions.

Step 1: Install Dexto

npm install -g dexto

Or build from source:

git clone https://github.com/truffle-ai/dexto.git
cd dexto && pnpm install && pnpm install-cli

Step 2: Initialize and Configure

Run the first startup wizard:

dexto

It will guide you through:

Selecting the default LLM (recommended GPT-4o or Claude 3)
Entering the API key
Setting the storage method (recommended Redis)
Launching the Web UI

Once completed, the default browser will open http://localhost:3000

Step 3: Install the Image Agent

Back in the terminal, install the Nano Banana Agent:

dexto install nano-banana-agent

This agent is based on Google’s Gemini 2.5 Flash Image model and excels at image generation.

Step 4: Start Drawing!

Select nano-banana-agent in the Web UI and input the prompt:

“An orange cat in a spacesuit, watching the Earth rise on Mars, in a cyberpunk style, high definition details”

Seconds later—boom! An imaginative image appears.

You can also continue to ask:

“Change it to an anime style, add some space stations in the background”

It will edit based on the original image rather than regenerate, saving time and computing power.

The entire process is smooth and natural, like conversing with a master artist.

6. Advanced Play: Building a Multi-Agent Collaboration System

While a single agent is already powerful, what if multiple agents work together?

Dexto supports Multi-Agent Systems, enabling “teamwork”.

Scenario Example: Automatically Operating a Public Account

Imagine you want to create a “fully automated public account operation agent team” that includes:

Agent Role	Responsibilities
Researcher	Collect trending topics
Writer	Draft articles
Editor	Edit and polish
Designer	Design images
Publisher	Format and publish

How do they collaborate?

Researcher regularly scrapes trending topics from Zhihu and Weibo
Discovers the topic of “AI drawing becoming popular” → Notifies Writer
Writer drafts an article titled “Introduction to Stable Diffusion”
Submits it to Editor for style modification
Editor thinks images are needed → Passes it to Designer
Designer generates cover images + internal illustrations
All materials are compiled for Publisher
Publisher formats → Previews → Awaits your confirmation → Publishes

The entire process can be achieved through Dexto’s MCP (Model Context Protocol) for cross-agent communication.

7. MCP Protocol: The “Universal Language” Between Agents

MCP (Model Context Protocol) is one of Dexto’s core innovations.

Its purpose is:to provide a standard protocol that allows different AI models and tools to understand and collaborate with each other.

To draw an analogy:

HTTP is the communication protocol between web pages
SMTP is the communication protocol between emails
MCP is the communication protocol between AI agents

What problem does it solve?

❌ Traditional Pain Points

Previously, each tool had its own set of interface definitions, such as:

Tool A returns JSON
Tool B requires XML
Model C can only process plain text

This led to integration difficulties and inconsistent data formats.

✅ MCP Solution

MCP specifies a standardized data exchange format that includes:

Context: The background of the current task
Intent: What you want to do
Payload: Specific data
Metadata: Source, time, priority, etc.

This way, regardless of who initiates the request, the receiver can correctly parse and respond.

Moreover, Dexto comes with an MCP Server and Playground, making it easy for you to test your own MCP services.

8. Deployment Options: Local, Cloud, or Hybrid

One of Dexto’s biggest advantages is its ability to run anywhere.

🏠 Local Deployment (Privacy First)

Suitable for individual developers or enterprises sensitive to data.

Features:

All data stays local
Can connect to Ollama to run open-source models
Basic functions can be used without internet access

Command:

dexto --host 0.0.0.0 --port 3000

☁️ Cloud Deployment (High Availability)

Suitable for team collaboration or providing external services.

Can be deployed with Docker in one command:

FROM node:18
RUN npm install -g dexto
COPY . /app
WORKDIR /app
CMD ["dexto"]

Combined with Nginx + Redis + PostgreSQL to form a complete production environment.

🔄 Hybrid Architecture (Best Balance)

A more advanced approach is the “hybrid mode”:

Sensitive operations are executed locally (e.g., reading private documents)
Complex computations are done in the cloud (e.g., video generation)
Results are merged and returned to the user

Achieving both “security” and “performance”.

9. Who is Dexto Suitable For?

After all this, you might ask: who is this suitable for?

I have summarized four typical user types:

1. Independent Developers / Geek Players

Want to quickly validate AI ideas
Enjoy tinkering with new technologies
Hope to enhance personal productivity

👉 Use Dexto to create a dedicated AI assistant to help you write code, draw, and write copy, doubling your efficiency.

2. Product Managers / Designers

Do not understand programming but want to experience AI capabilities
Need to quickly prototype
Focus on user experience

👉 The Web UI is user-friendly, configuration is simple, and drag-and-drop operations allow zero-code interaction with AI.

3. Enterprise Technical Leaders

Want to embed AI capabilities into existing businesses
Need controllable, auditable, and scalable solutions
Emphasize security and compliance

👉 Dexto supports private deployment, permission control, and operation logging, meeting enterprise-level requirements.

4. AI Entrepreneurs

Are developing AI-native applications
Need to quickly iterate on MVPs
Hope to reduce development costs

👉 Directly build products based on Dexto, saving time on underlying framework development and focusing on business logic.

10. Future Outlook: Where Will Dexto Go?

Although Dexto is still in the Beta stage, its design philosophy and technical roadmap show great potential.

I believe its possible development directions include:

🔮 1. Becoming the “Android Operating System” of the AI Era

Just as Android unified the mobile ecosystem, Dexto is expected to become the “standard platform” for AI agents.

Third-party developers can publish their agent applications on it, forming an “AI app store.”

🧠 2. Introducing Stronger Autonomous Decision-Making Capabilities

Current agents are more “executors”; in the future, they may include:

Goal Decomposition
Self-Reflection
Active Inquiry

Making them truly possess “human-like thinking”.

🌐 3. Building a Decentralized Agent Network

Combining blockchain or P2P technology to achieve:

Free trading of services between agents
Users owning their AI identities
Data sovereignty returning to individuals

Just thinking about it is exciting.

Conclusion: The Era of AI Agents Has Arrived, Are You Ready?

Friends, we are standing at the threshold of a new era.

The past decade was the era of “mobile internet + apps”; the next decade will likely be the era of “AI agents + automation”.

Dexto is one of the first “pioneers” of this era.

It may not be perfect, but it is open enough, flexible enough, and powerful enough to give us a glimpse of the future.

So, my advice is:

Don’t just be a bystander; get your hands dirty!

Even if it’s just installing Dexto to let it help you draw a picture or write a piece of code, you will feel the shock of “AI truly working for you”.

After all, the future will not wait for anyone.

What we need to do is learn to surf before the wave arrives.

📌 Resource Links:

GitHub Repository: https://github.com/truffle-ai/dexto^[1]
Official Documentation: https://docs.dexto.ai^[2]
Discord Community: https://discord.gg/GFzWFAAZcm^[3]
Agent Registry: https://docs.dexto.ai/docs/guides/agent-registry^[4]

References

[1]

https://github.com/truffle-ai/dexto: https://github.com/truffle-ai/dexto

[2]

https://docs.dexto.ai: https://docs.dexto.ai

[3]

https://discord.gg/GFzWFAAZcm: https://discord.gg/GFzWFAAZcm

[4]

https://docs.dexto.ai/docs/guides/agent-registry: https://docs.dexto.ai/docs/guides/agent-registry